Is this kind of behavior defined by standard?

Is this kind of behavior defined by standard? - c

#include <unistd.h>
int main(int argc, char* argv[])
{
char buf[500];
read(0, buf, 5);
return 0;
}
The above read 5 characters from stdin,but if I input more than 5:
12345morethan5
[root# test]# morethan5
-bash: morethan5: command not found
The remaining characters will be executed as shell commands.
Is this kind of behavior defined by standard?

Sort of :-)
Your program reads 5 characters, and that's it. Not less, not more. The rest remain in the terminal buffer and get sent to your shell once your C program terminates.
Since you are using read(), which is a raw system call, instead of any of the C stdio buffering alternatives this behaviour is not just expected, but required.
From the POSIX standard on read():
The read() function shall attempt to
read nbyte bytes from the file
associated with the open file
descriptor, fildes, into the buffer
pointed to by buf.
...
Upon successful completion, where
nbyte is greater than 0, read() shall
mark for update the st_atime field of
the file, and shall return the number
of bytes read. This number shall never
be greater than nbyte.
...
Upon successful completion, read()
[XSI] [Option Start] and pread()
[Option End] shall return a
non-negative integer indicating the
number of bytes actually read.
I.e. read() should never read more bytes from the file descriptor than requested.
From the related part on terminals:
It is not, however, necessary to read
a whole line at once; any number of
bytes, even one, may be requested in a
read() without losing information.
...
The last process to close a terminal device file shall cause any output to be sent to the device and any input to be discarded.
Note: normally your shell will still have an open file descriptor for the terminal, until you end the session.

That has nothing to do with any standard, it's up to your runtime what to write to stdin. Your runtime makes the standard input available to your program, which reads some bytes from it and quits, and then the remaining bytes are processed by the runtime itself -- if you can configure it to clear all the file descriptors after forking a process, you could maybe prevent this behaviour, but that would seriously impede most of the standard command line workflows which rely on attaching one process's input to another process's output...

Related

Why does forking my process cause the file to be read infinitely

I've boiled down my entire program to a short main that replicates the issue, so forgive me for it not making any sense.
input.txt is a text file that has a couple lines of text in it. This boiled down program should print those lines. However, if fork is called, the program enters an infinite loop where it prints the contents of the file over and over again.
As far as I understand fork, the way I use it in this snippet is essentially a no-op. It forks, the parent waits for the child before continuing, and the child is immediately killed.
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(){
freopen("input.txt", "r", stdin);
char s[MAX];
int i = 0;
char* ret = fgets(s, MAX, stdin);
while (ret != NULL) {
//Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0) {
exit(0);
} else {
waitpid(pid, &status, 0);
}
//End region
printf("%s", s);
ret = fgets(s, MAX, stdin);
}
}
Edit: Further investigation has only made my issue stranger. If the file contains <4 blank lines or <3 lines of text, it does not break. However, if there are more than that, it loops infinitely.
Edit2: If the file contains numbers 3 lines of numbers it will infinitely loop, but if it contains 3 lines of words it will not.

I am surprised that there is a problem, but it does seem to be a problem on Linux (I tested on Ubuntu 16.04 LTS running in a VMWare Fusion VM on my Mac) — but it was not a problem on my Mac running macOS 10.13.4 (High Sierra), and I wouldn't expect it to be a problem on other variants of Unix either.
As I noted in a comment:
There's an open file description and an open file descriptor behind each stream. When the process forks, the child has its own set of open file descriptors (and file streams), but each file descriptor in the child shares the open file description with the parent. IF (and that's a big 'if') the child process closing the file descriptors first did the equivalent of lseek(fd, 0, SEEK_SET), then that would also position the file descriptor for the parent process, and that could lead to an infinite loop. However, I've never heard of a library that does that seek; there's no reason to do it.
See POSIX open() and fork() for more information about open file descriptors and open file descriptions.
The open file descriptors are private to a process; the open file descriptions are shared by all copies of the file descriptor created by an initial 'open file' operation. One of the key properties of the open file description is the current seek position. That means that a child process can change the current seek position for a parent — because it is in the shared open file description.
neof97.c
I used the following code — a mildly adapted version of the original that compiles cleanly with rigorous compilation options:
#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(void)
{
if (freopen("input.txt", "r", stdin) == 0)
return 1;
char s[MAX];
for (int i = 0; i < 30 && fgets(s, MAX, stdin) != NULL; i++)
{
// Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0)
{
exit(0);
}
else
{
waitpid(pid, &status, 0);
}
// End region
printf("%s", s);
}
return 0;
}
One of the modifications limits the number of cycles (children) to just 30.
I used a data file with 4 lines of 20 random letters plus a newline (84 bytes total):
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
I ran the command under strace on Ubuntu:
$ strace -ff -o st-out -- neof97
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
…
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
$
There were 31 files with names of the form st-out.808## where the hashes were 2-digit numbers. The main process file was quite large; the others were small, with one of the sizes 66, 110, 111, or 137:
$ cat st-out.80833
lseek(0, -63, SEEK_CUR) = 21
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80834
lseek(0, -42, SEEK_CUR) = -1 EINVAL (Invalid argument)
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80835
lseek(0, -21, SEEK_CUR) = 0
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80836
exit_group(0) = ?
+++ exited with 0 +++
$
It just so happened that the first 4 children each exhibited one of the four behaviours — and each further set of 4 children exhibited the same pattern.
This shows that three out of four of the children were indeed doing an lseek() on standard input before exiting. Obviously, I have now seen a library do it. I have no idea why it is thought to be a good idea, though, but empirically, that is what is happening.
neof67.c
This version of the code, using a separate file stream (and file descriptor) and fopen() instead of freopen() also runs into the problem.
#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(void)
{
FILE *fp = fopen("input.txt", "r");
if (fp == 0)
return 1;
char s[MAX];
for (int i = 0; i < 30 && fgets(s, MAX, fp) != NULL; i++)
{
// Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0)
{
exit(0);
}
else
{
waitpid(pid, &status, 0);
}
// End region
printf("%s", s);
}
return 0;
}
This also exhibits the same behaviour, except that the file descriptor on which the seek occurs is 3 instead of 0. So, two of my hypotheses are disproven — it's related to freopen() and stdin; both are shown incorrect by the second test code.
Preliminary diagnosis
IMO, this is a bug. You should not be able to run into this problem.
It is most likely a bug in the Linux (GNU C) library rather than the kernel. It is caused by the lseek() in the child processes. It is not clear (because I've not gone to look at the source code) what the library is doing or why.
GLIBC Bug 23151
GLIBC Bug 23151 - A forked process with unclosed file does lseek before exit and can cause infinite loop in parent I/O.
The bug was created 2018-05-08 US/Pacific, and was closed as INVALID by 2018-05-09. The reason given was:
Please read
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01,
especially this paragraph:
Note that after a fork(), two handles exist where one existed before. […]
POSIX
The complete section of POSIX referred to (apart from verbiage noting that this is not covered by the C standard) is this:
2.5.1 Interaction of File Descriptors and Standard I/O Streams
An open file description may be accessed through a file descriptor, which is created using functions such as open() or pipe(), or through a stream, which is created using functions such as fopen() or popen(). Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.
Handles can be created or destroyed by explicit user action, without affecting the underlying open file description. Some of the ways to create them include fcntl(), dup(), fdopen(), fileno(), and fork(). They can be destroyed by at least fclose(), close(), and the exec functions.
A file descriptor that is never used in an operation that could affect the file offset (for example, read(), write(), or lseek()) is not considered a handle for this discussion, but could give rise to one (for example, as a consequence of fdopen(), dup(), or fork()). This exception does not include the file descriptor underlying a stream, whether created with fopen() or fdopen(), so long as it is not used directly by the application to affect the file offset. The read() and write() functions implicitly affect the file offset; lseek() explicitly affects it.
The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.
A handle which is a stream is considered to be closed when either an fclose(), or freopen() with non-full(1) filename, is executed on it (for freopen() with a null filename, it is implementation-defined whether a new handle is created or the existing one reused), or when the process owning that stream terminates with exit(), abort(), or due to a signal. A file descriptor is closed by close(), _exit(), or the exec() functions when FD_CLOEXEC is set on that file descriptor.
(1) [sic] Using 'non-full' is probably a typo for 'non-null'.
For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. All activity by the application affecting the file offset on the first handle shall be suspended until it again becomes the active file handle. (If a stream function has as an underlying function one that affects the file offset, the stream function shall be considered to affect the file offset.)
The handles need not be in the same process for these rules to apply.
Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec() functions or _exit() (not exit()), the handle is never accessed in that process.)
For the first handle, the first applicable condition below applies. After the actions required below are taken, if the handle is still open, the application can close it.
If it is a file descriptor, no action is required.
If the only further action to be performed on any handle to this open file descriptor is to close it, no action need be taken.
If it is a stream which is unbuffered, no action need be taken.
If it is a stream which is line buffered, and the last byte written to the stream was a <newline> (that is, as if a
putc('\n')
was the most recent operation on that stream), no action need be taken.
If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed.
If the stream is open for reading and it is at the end of the file (feof() is true), no action need be taken.
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
For the second handle:
If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location.
If the active handle ceases to be accessible before the requirements on the first handle, above, have been met, the state of the open file description becomes undefined. This might occur during functions such as a fork() or _exit().
The exec() functions make inaccessible all streams that are open at the time they are called, independent of which streams or file descriptors may be available to the new process image.
When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks. It is implementation-defined whether, and under what conditions, all input is seen exactly once.
Each function that operates on a stream is said to have zero or more "underlying functions". This means that the stream function shares certain traits with the underlying functions, but does not require that there be any relation between the implementations of the stream function and its underlying functions.
Exegesis
That is hard reading! If you're not clear on the distinction between open file descriptor and open file description, read the specification of open() and fork() (and dup() or dup2()). The definitions for file descriptor and open file description are also relevant, if terse.
In the context of the code in this question (and also for Unwanted child processes being created while file reading), we have a file stream handle open for reading only which has not yet encountered EOF (so feof() would not return true, even though the read position is at the end of the file).
One of the crucial parts of the specification is: The application shall prepare for a fork() exactly as if it were a change of active handle.
This means that the steps outlined for 'first file handle' are relevant, and stepping through them, the first applicable condition is the last:
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
If you look at the definition for fflush(), you find:
If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() shall cause any unwritten data for that stream to be written to the file, [CX] ⌦ and the last data modification and last file status change timestamps of the underlying file shall be marked for update.
For a stream open for reading with an underlying file description, if the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream, and any characters pushed back onto the stream by ungetc() or ungetwc() that have not subsequently been read from the stream shall be discarded (without further changing the file offset). ⌫
It isn't exactly clear what happens if you apply fflush() to an input stream associated with a non-seekable file, but that isn't our immediate concern. However, if you're writing generic library code, then you might need to know whether the underlying file descriptor is seekable before doing a fflush() on the stream. Alternatively, use fflush(NULL) to have the system do whatever is necessary for all I/O streams, noting that this will lose any pushed-back characters (via ungetc() etc).
The lseek() operations shown in the strace output seem to be implementing the fflush() semantics associating the file offset of the open file description with the file position of the stream.
So, for the code in this question, it seems that fflush(stdin) is necessary before the fork() to ensure consistency. Not doing that leads to undefined behaviour ('if this is not done, the result is undefined') — such as looping indefinitely.

The exit() call closes all open file handles. After the fork, the child and parent have identical copies of the execution stack, including the FileHandle pointer. When the child exits, it closes the file and resets the pointer.
int main(){
freopen("input.txt", "r", stdin);
char s[MAX];
prompt(s);
int i = 0;
char* ret = fgets(s, MAX, stdin);
while (ret != NULL) {
//Commenting out this region fixes the issue
int status;
pid_t pid = fork(); // At this point both processes has a copy of the filehandle
if (pid == 0) {
exit(0); // At this point the child closes the filehandle
} else {
waitpid(pid, &status, 0);
}
//End region
printf("%s", s);
ret = fgets(s, MAX, stdin);
}
}

As /u/visibleman pointed out, the child thread is closing the file and messing things up in main.
I was able to work around it by checking if the program is in terminal mode with
!isatty(fileno(stdin))
And if stdin has been redirected, then it will read all of it into a linkedlist before doing any processing or forking.

Replace exit(0) with _exit(0), and all is fine. This is an old unix tradition, if you are using stdio, your forked image must use _exit(), not exit().

C Read in bash : stdin and stdout

I have a simple C program with the read function and I don't understand the output.
//code1.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main()
{
int r;
char c; // In C, char values are stored in 1 byte
r = read ( 0, &c, 1);
// DOC:
//ssize_t read (int filedes, void *buffer, size_t size)
//The read function reads up to size bytes from the file with descriptor filedes, storing the results in the buffer.
//The return value is the number of bytes actually read.
// Here:
// filedes is 0, which is stdin from <stdio.h>
// *buffer is &c : address in memory of char c
// size is 1 meaning it will read only 1 byte
printf ("r = %d\n", r);
return 0;
}
And here is a screenshot of the result:
I ran this program 2 times as showed above and typed "a" for the first try and "aecho hi" for the second try.
How I try to explain the results:
When read is called it sees that stdin is closed and opens it (from my point of view, why? It should just read it. I don't know why it opens it).
I type "aecho hi" in the bash and press enter.
read has priority to process stdin and reads the first byte of "aecho hi" : "a".
I get the confirmation that read has processed 1 byte with the printf.
a.out has finished and is terminated.
Somehow the remaining data in stdin is processed in bash (the father of my program) and goes to stdout which executes it and for some reason the first byte has been deleted by read.
This is all hypothetical and very blurry. Any help understanding what is happening would be very welcome.

When you type at your terminal emulator, it writes your keystrokes to a "file", in this case an in-memory buffer that, thanks to the file system, looks just like any other file that might be on disk.
Every process inherits 3 open file handles from its parent. We are interested in one of them here, standard input. The program executed by the terminal emulator (here, bash), is given as its standard input the in-memory buffer described in the first paragraph.
a.out, when run by bash, also receives this same file as its standard input. Keep this in mind: bash and a.out are reading from the same, already-opened file.
After you run a.out, its read blocks, because its standard input is empty. When you type aecho hi<enter>, the terminal writes these characters to the buffer (<enter> becoming a single linefeed character). a.out only requests one character, so it gets a and leaves the rest of the characters in the file. (Or more precisely, the file pointer is still pointing at the e after a is read.)
After a.out completes, bash tries to read from the same file. Normally, the file is empty (i.e., the file pointer is at the end of the file), so bash blocks waiting for another command. In this case, though, there is input available already: echo hi\n. bash reads this now the same as if you had typed it after a.out completed.

Check this. As alk suggests stdin and stdout are already open with the program. Now you have to understand, once you type:
aecho hi
and hit return the stdin buffer is filled with all those letters (and space) - and will continue to be as long as you don't flush it. When the program exits, the stdin buffer is still full, and your terminal automatically handles a write into stdin by echoing it to stdout - this is what you're seeing at the end - your shell reading stdin.
Now as you point out, your code "presses return" for you so to speak - in the first execution adding an empty shell line, and in the second executing echo hi. But you must remember, you pressed return, so "\n" is in the buffer! To be explicit, you in fact typed:
aecho hi\n
Once your program exits the shell reads the remaining characters in the buffer, including the return, and that's what you see!

Why is stdout buffering?

I am trying to learn the libuv api and wrote the following test:
#include <stdio.h>
#include <stdlib.h>
#include <uv.h>
void timer_cb(uv_timer_t* timer) {
int* i = timer->data;
--*i;
if(*i == 0) {
uv_timer_stop(timer);
}
printf("timer %d\n", *i);
//fflush(stdout);
}
int main() {
uv_loop_t* loop = uv_default_loop();
uv_timer_t* timer = malloc(sizeof(uv_timer_t));
uv_timer_init(loop, timer);
int i = 5;
timer->data = &i;
uv_timer_start(timer, timer_cb, 1000, 2000);
uv_run(loop, UV_RUN_DEFAULT);
printf("Now quitting.\n");
uv_close(timer, 0);
uv_loop_close(loop);
return 0;
}
When run it, no output is displayed until the program finishes running, and then all the output is displayed at once. If I uncomment the fflush line it works as expected, writing every 2 seconds.
Can someone please explain this to me? Why is stdout not flushed after the newline, as is explained here and in other places? Why do I need tomanually flush it?

Stream buffering is implementation-defined.
Per 7.21.3 Files, paragraph 3 of the C Standard:
When a stream is
unbuffered, characters are intended to appear from the source or at the destination as soon as possible. Otherwise characters
may be accumulated and transmitted to or from the host
environment as a block. When a stream is
fully buffered, characters are intended to be transmitted to or from the host environment as a block when a buffer is filled. When a
stream is
line buffered, characters are intended to be transmitted to or from the host environment as a block when a new-line
character is encountered. Furthermore, characters are intended to be
transmitted as a block to the host environment when a buffer is
filled, when input is requested on an unbuffered stream, or when
input is requested on a line buffered stream that requires
the transmission of characters from the host
environment. Support for these characteristics is
implementation-defined, and may be affected via the setbuf and
setvbuf functions.
The type of buffering is dependent on your implementation, and your implementation apparently isn't line-buffering in your example.

There is no strict requirement, that stdout is line buffered. It may be fully buffered as well (or not buffered at all), in which case \n does not trigger to flush the stream.
C11 (N1570) 7.21.3/7 Files:
As initially opened, the standard error stream is not fully buffered;
the standard input and standard output streams are fully buffered if
and only if the stream can be determined not to refer to an
interactive device.
C11 (N1570) 5.1.2.3/7 Program execution:
What constitutes an interactive device is implementation-defined.
You could try to force specific type of buffering by setvbuf standard function. For instance, to set line buffering for stdout, you may try with:
setvbuf(stdout, buff, _IOLBF, size);
where buff is declared as character array of size elements (e.g. 1024).
Note that setvbuf has to be called before any other I/O operation, that is performed to the stream.

For some reason, your system is deciding that your stdout is not interactive. Are you doing some strange redirect of stdout or doing something weird with your terminal? You should be able to override using setbuf or you can use stderr instead of stdout.

why "Line buffer stdout to ensure lines are written atomically and immediately"

I'm reading the source code of wc command, and in main function I found follow code:
/* Line buffer stdout to ensure lines are written atomically and immediately
so that processes running in parallel do not intersperse their output. */
setvbuf (stdout, NULL, _IOLBF, 0);
so why line buffer stdout ensure that?

Say block buffering is used for stdout instead of line buffering. (This is the default if stdout refers to a regular file, for example.) Let the buffer size be 1024 bytes (so that output is flushed to the file every 1024 bytes), and pretend that two processes are writing to the same file.
Say that the first process currently has 1020 bytes in its I/O buffer and writes the line "foo_file 37\n" to stdout. This will put "foo_" at the end of the I/O buffer, flush the buffer to the file (since the buffer is now full), and then put "file 37\n" at the beginning of the buffer. Say that the second process then comes along and flushes its buffer, which happens to start with "bar_file 48\n". The resulting line in the output file will then be "foo_bar_file 48", which clearly isn't what we want.
The basic problem is that buffer boundaries do not necessarily correspond to line boundaries when block buffering is used.
You could play around with two instances of the following program writing to the same file to see this effect in action yourself:
#include <stdio.h>
int main(void) {
setvbuf (stdout, NULL, _IOLBF, 0);
for (;;)
puts("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
return 0;
}
With the setvbuf() call commented out, you will see some lines get mixed up with other lines. Be aware that this will program will quickly write a huge file, of course. :)

read() from stdin

Consider the following line of code:
while((n = read(STDIN_FILENO, buff, BUFSIZ)) > 0)
As per my understanding read/write functions are a part of non-buffered I/O. So does that mean read() function will read only one character per call from stdio? Or in other words, the value of n will be
-1 in case of error
n = 0 in case of EOF
1 otherwise
If it is not the case, when would the above read() function will return and why?
Note: I was also thinking that read() will wait until it successfully reads BUFSIZ number of characters from stdin. But what happens in a case number of characters available to read are less than BUFSIZ? Will read wait forever or until EOF arrives (Ctrl + D on unix or Ctrl + Z on windows)?
Also, lets say BUFSIZ = 100 and stdin = ACtrl+D (i.e EOF immediately following a single character). Now how many times the while loop will iterate?

The way read() behaves depends on what is being read. For regular files, if you ask for N characters, you get N characters if they are available, less than N if end of file intervenes.
If read() is reading from a terminal in canonical/cooked mode, the tty driver provides data a line at a time. So if you tell read() to get 3 characters or 300, read will hang until the tty driver has seen a newline or the terminal's defined EOF key, and then read() will return with either the number of characters in the line or the number of characters you requested, whichever is smaller.
If read() is reading from a terminal in non-canonical/raw mode, read will have access to keypresses immediately. If you ask read() to get 3 characters it might return with anywhere from 0 to 3 characters depending on input timing and how the terminal was configured.
read() will behave differently in the face of signals, returning with less than the requested number of characters, or -1 with errno set to EINTR if a signal interrupted the read before any characters arrived.
read() will behave differently if the descriptor has been configured for non-blocking I/O. read() will return -1 with errno set to EAGAIN or EWOULDBLOCK if no input was immediately available. This applies to sockets.
So as you can see, you should be ready for surprises when you call read(). You won't always get the number of characters you requested, and you might get non-fatal errors like EINTR, which means you should retry the read().

Your code reads:
while((n = read(0, buff, BUFSIZ) != 0))
This is flawed - the parentheses mean it is interpreted as:
while ((n = (read(0, buff, BUFSIZ) != 0)) != 0)
where the boolean condition is evaluated before the assignment, so n will only obtain the values 0 (the condition is not true) and 1 (the condition is true).
You should write:
while ((n = read(0, buff, BUFSIZ)) > 0)
This stops on EOF or a read error, and n lets you know which condition you encountered.
Apparently, the code above was a typo in the question.
Unbuffered I/O will read up to the number of characters you read (but not more). It may read less on account of EOF or an error. It may also read less because less is available at the time of the call. Consider a terminal; typically, that will only read up to the end of line because there isn't any more available than that. Consider a pipe; if the feeding process has generated 128 unread bytes, then if BUFSIZ is 4096, you'll only get 128 bytes from the read. A non-blocking file descriptor may return because nothing is available; a socket may return fewer bytes because there isn't more information available yet; a disk read may return fewer bytes because there are fewer than the requested number of bytes left in the file when the read is performed.
In general, though, read() won't return just one byte if you request many bytes.

As the read() manpage states:
Return Value
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
So, each read() will read up to the number of specified bytes; but it may read less. "Non-buffered" means that if you specify read(fd, bar, 1), read will only read one byte. Buffered IO attempts to read in quanta of BUFSIZ, even if you only want one character. This may sound wasteful, but it avoids the overhead of making system calls, which makes it fast.

read attempts to get all of characters requested.
if EOF happens before all of the requested characters can be returned, it returns what it got
after it does this the next read returns -1, to let you know you the file end.
What happens when it tries to read and there is nothing there involves something called blocking. You can call open to read a file blocking or non-blocking. "blocking" means wait until there is something to return.
This is what you see in a shell waiting for input. It sits there. Until you hit return.
Non-blocking means that read will return no bytes of data if there are none. Depending on a lot of other factors which would make a completely correct answer unusable for you, read will set errno to something like EWOULDBLOCK, which lets you know why your read returned zero bytes. It is not necessarily a fatal error.
Your code could test for a minus to find EOF or errors

When we say read is unbuffered, it means no buffering takes place at the level of your process after the data is pulled off the underlying open file description, which is a potentially-shared resource. If stdin is a terminal, there are likely at least 2 additional buffers in play, however:
The terminal buffer, which can probably hold 1-4k of data off the line until.
The kernel's cooked/canonical mode buffer for line entry/editing on a terminal, which lets the user perform primitive editing (backspace, backword, erase line, etc.) on the line until it's submitted (to the buffer described above) by pressing enter.
read will pull whatever has already been submitted, up to the max read length you passed to it, but it cannot pull anything from the line editing buffer. If you want to disable this extra layer of buffering, you need to lookup how to disable cooked/canonical mode for a terminal using tcsetattr, etc.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Is this kind of behavior defined by standard? - c

Related

Why does forking my process cause the file to be read infinitely

C Read in bash : stdin and stdout

Why is stdout buffering?

why "Line buffer stdout to ensure lines are written atomically and immediately"

read() from stdin

Categories

Resources