read() from stdin - c

Consider the following line of code:
while((n = read(STDIN_FILENO, buff, BUFSIZ)) > 0)
As per my understanding read/write functions are a part of non-buffered I/O. So does that mean read() function will read only one character per call from stdio? Or in other words, the value of n will be
-1 in case of error
n = 0 in case of EOF
1 otherwise
If it is not the case, when would the above read() function will return and why?
Note: I was also thinking that read() will wait until it successfully reads BUFSIZ number of characters from stdin. But what happens in a case number of characters available to read are less than BUFSIZ? Will read wait forever or until EOF arrives (Ctrl + D on unix or Ctrl + Z on windows)?
Also, lets say BUFSIZ = 100 and stdin = ACtrl+D (i.e EOF immediately following a single character). Now how many times the while loop will iterate?

The way read() behaves depends on what is being read. For regular files, if you ask for N characters, you get N characters if they are available, less than N if end of file intervenes.
If read() is reading from a terminal in canonical/cooked mode, the tty driver provides data a line at a time. So if you tell read() to get 3 characters or 300, read will hang until the tty driver has seen a newline or the terminal's defined EOF key, and then read() will return with either the number of characters in the line or the number of characters you requested, whichever is smaller.
If read() is reading from a terminal in non-canonical/raw mode, read will have access to keypresses immediately. If you ask read() to get 3 characters it might return with anywhere from 0 to 3 characters depending on input timing and how the terminal was configured.
read() will behave differently in the face of signals, returning with less than the requested number of characters, or -1 with errno set to EINTR if a signal interrupted the read before any characters arrived.
read() will behave differently if the descriptor has been configured for non-blocking I/O. read() will return -1 with errno set to EAGAIN or EWOULDBLOCK if no input was immediately available. This applies to sockets.
So as you can see, you should be ready for surprises when you call read(). You won't always get the number of characters you requested, and you might get non-fatal errors like EINTR, which means you should retry the read().

Your code reads:
while((n = read(0, buff, BUFSIZ) != 0))
This is flawed - the parentheses mean it is interpreted as:
while ((n = (read(0, buff, BUFSIZ) != 0)) != 0)
where the boolean condition is evaluated before the assignment, so n will only obtain the values 0 (the condition is not true) and 1 (the condition is true).
You should write:
while ((n = read(0, buff, BUFSIZ)) > 0)
This stops on EOF or a read error, and n lets you know which condition you encountered.
Apparently, the code above was a typo in the question.
Unbuffered I/O will read up to the number of characters you read (but not more). It may read less on account of EOF or an error. It may also read less because less is available at the time of the call. Consider a terminal; typically, that will only read up to the end of line because there isn't any more available than that. Consider a pipe; if the feeding process has generated 128 unread bytes, then if BUFSIZ is 4096, you'll only get 128 bytes from the read. A non-blocking file descriptor may return because nothing is available; a socket may return fewer bytes because there isn't more information available yet; a disk read may return fewer bytes because there are fewer than the requested number of bytes left in the file when the read is performed.
In general, though, read() won't return just one byte if you request many bytes.

As the read() manpage states:
Return Value
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
So, each read() will read up to the number of specified bytes; but it may read less. "Non-buffered" means that if you specify read(fd, bar, 1), read will only read one byte. Buffered IO attempts to read in quanta of BUFSIZ, even if you only want one character. This may sound wasteful, but it avoids the overhead of making system calls, which makes it fast.

read attempts to get all of characters requested.
if EOF happens before all of the requested characters can be returned, it returns what it got
after it does this the next read returns -1, to let you know you the file end.
What happens when it tries to read and there is nothing there involves something called blocking. You can call open to read a file blocking or non-blocking. "blocking" means wait until there is something to return.
This is what you see in a shell waiting for input. It sits there. Until you hit return.
Non-blocking means that read will return no bytes of data if there are none. Depending on a lot of other factors which would make a completely correct answer unusable for you, read will set errno to something like EWOULDBLOCK, which lets you know why your read returned zero bytes. It is not necessarily a fatal error.
Your code could test for a minus to find EOF or errors

When we say read is unbuffered, it means no buffering takes place at the level of your process after the data is pulled off the underlying open file description, which is a potentially-shared resource. If stdin is a terminal, there are likely at least 2 additional buffers in play, however:
The terminal buffer, which can probably hold 1-4k of data off the line until.
The kernel's cooked/canonical mode buffer for line entry/editing on a terminal, which lets the user perform primitive editing (backspace, backword, erase line, etc.) on the line until it's submitted (to the buffer described above) by pressing enter.
read will pull whatever has already been submitted, up to the max read length you passed to it, but it cannot pull anything from the line editing buffer. If you want to disable this extra layer of buffering, you need to lookup how to disable cooked/canonical mode for a terminal using tcsetattr, etc.

Related

why does fread from stdin not stop [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I am trying to read input from stdin with fread(). However i am have a problem, the loop will not terminate and instead keeps reading.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "argument err");
return -1;
}
FILE *in = fopen(argv[1], "w");
if (in == NULL) {
fprintf(stderr, "failed to open file");
return -1;
}
char buffer[20];
size_t ret;
while ((ret = fread(buffer, 1, 20, stdin)) > 0) {
if (fwrite(buffer, 1, ret, in) != ret) {
if (ferror(in) != 0) {
perror("write err:");
}
}
}
return 0;
}
How can i make this loop terminate when EOF is reached? i have tried using ctrl+D but that just seems like a strange way to stop taking input.
I guess what i want is to use fread() to read multiple arbitrary amounts of data in chunks of 20 bytes and then somehow stop.
How can i make this loop terminate when EOF is reached?
When do you think EOF is reached? Really. When you are providing input interactively, how is the system or the program supposed to know that you've entered all the data you want the program to consume?
i have tried using ctrl+D but that just seems like a strange way to stop taking input.
It is exactly the way to signal a soft EOF to a POSIX terminal. Since you want the loop to stop when EOF is encountered, it seems absolutely natural to me to use ctrl+D for the purpose when providing data interactively. That's not the only way you could signal the end of the input, but it has a lot going for it.
I guess what i want is to use fread() to read multiple arbitrary amounts of data in chunks of 20 bytes and then somehow stop.
Again: how is the program supposed to know when it has consumed all the "multiple arbitrary amounts" of data that you decide to provide on a given run? An EOF signal is an eminently reasonable choice for multiple reasons, and the way to deliver that from a POSIX terminal interface is ctrl+D.
As pointed out before you are reading from an eternal stream, this means that stdin don't naturally have a EOF (or <=0) value.
If you want your loop to terminate, you will have to add a termination condition, like a certain character, word or all type of value. After that you could use a break or a return in some case. You could also search if your terminal emulator support the insertion of an EOF value into the stdin, which is pretty common (But very platform dependent).
ADD: On my system, typical linux, CTRL+D is for an EOF insertion in stdin. It seems that you found this out yourself, and if you want your program to know where to stop you will need to use this.
You cand also send a signal to your program, usually done with a shortcut like CTRL+D, CTRL+C, CTRL+T etc... there is all sort of signal, which can be sent by your system or/and your TE and you just have to implement in your program the corresponding signal receiver.
How can i make this loop terminate when EOF is reached? i have tried using ctrl+D but that just seems like a strange way to stop taking input.
fread and fwrite are there to read data records, so they (both) take the number of records to read and the size of the record. If the available data doesn't fit on a full record, you will not get the full record at all (indeed, the routines return the number of full records read, and the partial read will be waiting for the next fread() call.)
All the calls in stdio.h package are buffered, so the buffer holds the data that has been read (from the system) but not yet consumed by the user, and so, this makes me to wonder why are you trying to use a buffer to read data that is already buffered?
EOF is produced when you are trying to read one record and the fread() call results in a true end of file from the system (this normally requires two calls, the first to complete the remaining data, the second resulting in no data ---zero bytes--- returned from the system) So you have to distinguish two cases:
fread() returns 0 in case it has read something, but is not enough to complete a record.
fread() returns EOF in case it has read nothing (the true end of file is reached)
As I've said above, fread() & fwrite() will read/write full records (this is useful when your data is a struct with a fixed length, but normally not when you can have extra data at the end)
The way to terminate the loop should be something like this:
while ((ret = fread(buffer, 1, 20, stdin)) >= 0) {
if (fwrite(buffer, 1, ret, in) != ret) {
if (ferror(in) != 0) {
perror("write err:");
}
}
}
/* here you can have upto 19 bytes in the buffer that cannot
* be read with that record length, but you can read individually
* with fgetc() calls. */
so, if you read half a record (at end of file) only at the next fread() it will detect the end of file (by reading nothing) and you will be free of ending. (beware that the extra data that doesn't fill a full buffer, still needs to be read by other means)
The cheapest and easiest way to solve this problem (to copy a file from one descriptor to another) is described in K&R (in the first edition) and has not yet have better code to void it, is this:
int c;
while ((c = fgetc(in)) != EOF)
fputc(c, out);
while it seems to read the characters one by one, it actually makes a call to read(2) to completely fill a full buffer of data, and return just one character, next characters will be taken from the buffer, saving calls to read(), and the same happens to fputc() (it fills the buffer until it's full, then flushes it, in a single call to write()).
Many people has tried to defeat the code above, without any measurable gain in efficience. So, my hint is be simple, that the world is complicated enough to force you to go complex.

Pipe in MacOS always reads too few bytes (but Linux works)

This code works beautifully in Linux, but with MacOS it always fails to read the tree (it will read too bytes). It works with redirection from a file. It reads when a file is explicitly opened. But on a pipe, always too few bytes.
// Load the saved tree
uint16_t treeBytes = 0;
read(fileIn, &treeBytes, sizeof(treeBytes));
if (verbose) { printf("tree (%u)\n", treeBytes); }
uint8_t savedTree[treeBytes];
int readSz = read(fileIn, savedTree, treeBytes);
if (readSz != treeBytes)
{
fprintf(stderr, "%d != %u: ", readSz, treeBytes);
ERROR("Truncated tree read");
}
There are two bugs here:
You aren't checking the return from the first read() call. There are four possible returns here, three of which will break your program: -1 on error, 0 on abnormal close (typical for sockets only), 1 for a short read, and 2 (sizeof(treeBytes)) for a successful read. Don't assume.
You are collapsing those three failure cases as one in your second read(), which probably explains your reported symptom. There is nothing mandating that read() must block until it gets treeBytes from the pipe. It is allowed to return 1 byte at a time for a blocking FD, and 0 for a nonblocking FD. As Mark Sechell comented above, read in a loop until you have as many bytes as you expect or you hit an error case.

stdout stream changes order after redirection?

These days I was learning the "apue", a result of an typical case confused me. The following are the sample codes of "sample.c":
#include "apue.h"
#include <stdio.h>
#define BUFF_SZ 4096
int main()
{
int n = 0;
char buff[BUFF_SZ] = {'\0'};
while ((n = read(STDIN_FILENO, buff, BUFF_SZ)) > 0) {
printf("read %d bytes\n", n);
if (write(STDOUT_FILENO, buff, n) != n) {
err_sys("write error");
}
}
if (n < 0) {
err_sys("read error");
}
return 0;
}
After compilation gcc sample.c, you can use this command echo Hello | ./a.out and get the following std output on terminal:
read 6 bytesHello
However, if you redirect the output to a file echo Hello | ./a.out > outfile, then use cat outfile to see the content:
Helloread 6 bytes
The ouput changes order after redirection! I wonder if some one could tell me the reason?
For the standard I/O function printf, when you output to a terminal, the standard output is by default line buffered.
printf("read %d bytes\n", n);
\n here cause the output to flush.
However, when you output to a file, it's by default fully buffered. The output won't flush unless the buffer is full, or you explicitly flush it.
The low level system call write, on the other hand, is unbuffered.
In general, intermixing standard I/O calls with system calls is not advised.
printf(), by default, buffers its output, while write() does not, and there is no synchronisation between then.
So, in your code, it is possible that printf() stores its data in a buffer and returns, then write() is called, and - as main() returns, printf()s buffer is flushed so that buffered output appears. From your description, that is happening when output is redirected.
It is also possible that printf() writes data immediately, then write() is called. From your description, that happens when output is not redirected.
Typically, one part of redirection of a stream is changing the buffer - and therefore the behaviour when buffering - for streams like stdout and stdin. The precise change depends on what type of redirection is happening (e.g. to a file, to a pipe, to a different display device, etc).
Imagine that printf() writes data to a buffer and, when flushing that buffer, uses write() to produce output. That means all overt calls of write() will have their output produced immediately, but data that is buffered may be printed out of order.
The problem is that the writes are handled by write(2) call, so you effectively lose control of what happens.
If we look at the documentation for write(2) we can see that the writes are not guaranteed to be actually written until a read() occurs. More specifically:
A successful return from write() does not make any guarantee that data has
been committed to disk. In fact, on some buggy implementations, it does not even
guarantee that space has successfully been reserved for the data. The only way to
be sure is to call fsync(2) after you are done writing all your data.
This means that depending on the implementation and buffering of the write(2) (which may differ even between redirects and printing to screen), you can get different results.

Confusion about how a getchar() loop works internally

I've included an example program using getchar() below, for reference (not that anyone probably needs it), and feel free to address concerns with it if you desire. But my question is:
What exactly is going on when the program calls getchar()?
Here is my understanding (please clarify or correct me):
When getchar is called, it checks the STDIN buffer to see if there is any input.
If there isn't any input, getchar sleeps.
Upon wake, getchar checks to see if there is any input, and if not, puts it self to sleep again.
Steps 2 and 3 repeat until there is input.
Once there is input (which by convention includes an 'EOF' at the end), getchar returns the first character of this input and does something to indicate that the next call to getchar should return the second letter from the same buffer? I'm not really sure what that is.
When there are no more characters left other than EOF, does getchar flush the buffer?
The terms I used are probably not quite correct.
#include <stdio.h>
int getLine(char buffer[], int maxChars);
#define MAX_LINE_LENGTH 80
int main(void){
char line[MAX_LINE_LENGTH];
int errorCode;
errorCode = getLine(line, sizeof(line));
if(errorCode == 1)
printf("Input exceeded maximum line length of %d characters.\n", MAX_LINE_LENGTH);
printf("%s\n", line);
return 0;
}
int getLine(char buffer[], int maxChars){
int c, i = 0;
while((c = getchar()) != EOF && c != '\n' && i < maxChars - 1)
buffer[i++] = c;
buffer[i++] = '\0';
if(i == maxChars)
return 1;
else
return 0;
}
Step 2-4 are slightly off.
If there is no input in the standard I/O buffer, getchar() calls a function to reload the buffer. On a Unix-like system, that normally ends up calling the read() system call, and the read() system call puts the process to sleep until there is input to be processed, or the kernel knows there will be no input to be processed (EOF). When the read returns, the code adjusts the data structures so that getchar() knows how much data is available. You description implies polling; the standard I/O system does not poll for input.
Step 5 uses the adjusted pointers to return the correct values.
There really isn't an EOF character; it is a state, not a character. Even though you type Control-D or Control-Z to indicate 'EOF', that character is not inserted into the input stream. In fact, those characters cause the system to flush any typed characters that are still waiting for 'line editing' operations (like backspace) to change them so that they are made available to the read() system call. If there are no such characters, then read() returns 0 as the number of available characters, which means EOF. Then getchar() returns the value EOF (usually -1 but guaranteed to be negative whereas valid characters are guaranteed to be non-negative (zero or positive)).
So basically, rather than polling, is it that hitting Return causes a certain I/O interrupt, and then when the OS receives this, it wakes up any processes that are sleeping for I/O?
Yes, hitting Return triggers interrupts and the OS kernel processes them and wakes up processes that are waiting for the data. The terminal driver is woken by the kernel when interrupt occurs, and decides what to do with the character(s) that were just received. They may be stashed for further processing (canonical mode) or made available immediately (raw mode), etc. Assuming, of course, that the input is a terminal; if the input is from a disk file, it is simpler in many ways — or if it is a pipe, or …
Nominally, it isn't the terminal app that gets woken by the interrupt; it is the kernel that wakes first, then the shell running in the terminal app that is woken because there's data for it to read, and only when there's output does the terminal app get woken.
I say 'nominally' because there's an outside chance that in fact the terminal app does mediate the I/O via a pty (pseudo-tty), but I think it happens at the kernel level and the terminal application is involved fairly late in the process. There's a huge disconnect really between the keyboard where you type and the display where what you type appears.
See also Canonical vs non-canonical terminal input.

Is this kind of behavior defined by standard?

#include <unistd.h>
int main(int argc, char* argv[])
{
char buf[500];
read(0, buf, 5);
return 0;
}
The above read 5 characters from stdin,but if I input more than 5:
12345morethan5
[root# test]# morethan5
-bash: morethan5: command not found
The remaining characters will be executed as shell commands.
Is this kind of behavior defined by standard?
Sort of :-)
Your program reads 5 characters, and that's it. Not less, not more. The rest remain in the terminal buffer and get sent to your shell once your C program terminates.
Since you are using read(), which is a raw system call, instead of any of the C stdio buffering alternatives this behaviour is not just expected, but required.
From the POSIX standard on read():
The read() function shall attempt to
read nbyte bytes from the file
associated with the open file
descriptor, fildes, into the buffer
pointed to by buf.
...
Upon successful completion, where
nbyte is greater than 0, read() shall
mark for update the st_atime field of
the file, and shall return the number
of bytes read. This number shall never
be greater than nbyte.
...
Upon successful completion, read()
[XSI] [Option Start] and pread()
[Option End] shall return a
non-negative integer indicating the
number of bytes actually read.
I.e. read() should never read more bytes from the file descriptor than requested.
From the related part on terminals:
It is not, however, necessary to read
a whole line at once; any number of
bytes, even one, may be requested in a
read() without losing information.
...
The last process to close a terminal device file shall cause any output to be sent to the device and any input to be discarded.
Note: normally your shell will still have an open file descriptor for the terminal, until you end the session.
That has nothing to do with any standard, it's up to your runtime what to write to stdin. Your runtime makes the standard input available to your program, which reads some bytes from it and quits, and then the remaining bytes are processed by the runtime itself -- if you can configure it to clear all the file descriptors after forking a process, you could maybe prevent this behaviour, but that would seriously impede most of the standard command line workflows which rely on attaching one process's input to another process's output...

Resources