Currently, I'm reading lines of input from child programs that are execlp'd. So basically, if a child program fails to execute properly, it shouldn't be piping information when read and an error will be thrown.
I attempted polling the file descriptor and it returns one no matter if the program was executed correctly. So basically I get past poll then fgetc is hanging/blocking as there is nothing to read, but fgetc is also not returning -1.
Reading and Polling:
char* read_line(int fd) {
// fd is a pipe's read end. I know it reads properly.
FILE *file = fdopen(fd, "r");
int ret;
struct pollfd fdinfo[1];
fdinfo[0].fd = fd;
fdinfo[0].events = POLLIN;
ret = poll(fdinfo,1, 1000);
if (ret < 0) {
return "NOPE";
}
char* result = malloc(sizeof(char) * 80);
memset(result, 0, sizeof(int));
int position = 0;
int next = 0;
while (1) {
next = fgetc(file); //STALLING HERE
if (next == '!') {
free(result);
return "!";
}
if (next == EOF || next == '\n') {
result[position] = '\0';
return result;
} else {
result[position++] = (char)next;
}
}
}
You are not adequately checking the information provided to you by poll(). You do detect the case that poll() reports an error by returning -1, but it doing otherwise does not guarantee that there are any data available to read. A negative return value indicates that poll() failed to do its job; it does not tell you about the state of any of the polled file descriptors.
In the first place, poll() returns 0 if it times out. In that case, your code just rolls on ahead and attempts to read from the file descriptor, which will block unless data happen to arrive between the return from poll() and the call to fgetc().
In the second place, poll() signals problems with a designated file descriptor by setting one or more of the corresponding revent bits, and reporting an event on that FD (in part by returning a positive number). Specifically,
poll() sets the POLLHUP, POLLERR and POLLNVAL flag in revents if the condition is true, even if the application did not set the corresponding bit in events
and
A positive [return] value indicates the total number of pollfd structures [...] for which the revents member is non-zero
(POSIX 1003.1-2008; emphasis added)
You might very well see a POLLHUP event if the child process writing to the other end of the pipe crashes. You could see a POLLNVAL event if your program screws up its file descriptor handling in certain ways, and you always have to allow for the possibility of an I/O error, which poll() signals as a POLLERR event.
Thus, to avoid blocking, you should attempt to read from the pipe only when poll() returns 1 and signals POLLIN among the events on the file. Conceivably, you might see that together with a POLLERR or a POLLHUP. I would recommend aborting if ever there is a POLLERR, but a POLLHUP is to be expected when the write end of the pipe is no longer open in any process, and that does not invalidate any data that may still be available to read (which would be signaled by an accompanying POLLIN).
Related
Similar to the problem asked a while ago on kernel 3.x, but I'm seeing it on 4.9.37.
The named fifo is created with mkfifo -m 0666. On the read side it is opened with
int fd = open(FIFO_NAME, O_RDONLY | O_NONBLOCK);
The resulting fd is passed into a call to select(). Everything works ok, till I run echo >> <fifo-name>.
Now the fd appears in the read_fds after the select() returns. A read() on the fd will return one byte of data. So far so good.
The next time when select() is called and it returns, the fd still appears in the read_fds, but read() will always return zero meaning with no data. Effectively the read side would consume 100% of the processor capacity. This is exactly the same problem as observed by the referenced question.
Has anybody seen the same issue? And how can it be resolved or worked-around properly?
I've figured out if I close the read end of the fifo, and re-open it again, it will work properly. This probably is ok because we are not sending a lot of data. Though this is not a nice or general work-around.
This is expected behaviour, because the end-of-input case causes a read() to not block; it returns 0 immediately.
If you look at man 2 select, it says clearly that a descriptor in readfds is set if a read() on that descriptor would not block (at the time of the select() call).
If you used poll(), it too would immediately return with POLLHUP in revents.
As OP notes, the correct workaround is to reopen the FIFO.
Because the Linux kernel maintains exactly one internal pipe object to represent each open FIFO (see man 7 fifo and man 7 pipe), the robust approach in Linux is to open another descriptor to the FIFO whenever an end of input is encountered (read() returning 0), and close the original. During the time when both descriptors are open, they refer to the same kernel pipe object, so there is no race window or risk of data loss.
In pseudo-C:
fifoflags = O_RDONLY | O_NONBLOCK;
fifofd = open(fifoname, fifoflags);
if (fifofd == -1) {
/* Error checking */
}
/* ... */
/* select() readfds contains fifofd, or
poll() returns POLLIN for fifofd: */
n = read(fifofd, buffer, sizeof buffer)
if (!n) {
int tempfd;
tempfd = open(fifopath, fifoflags);
if (tempfd == -1) {
const int cause = errno;
close(fifofd);
/* Error handling */
}
close(fifofd);
fifofd = tempfd;
/* A writer has closed the FIFO. */
} else
/* Handling for the other read() result cases */
The file descriptor allocation policy in Linux is such that tempfd will be the lowest-numbered free descriptor.
On my system (Core i5-7200U laptop), reopening a FIFO in this way takes less than 1.5 µs. That is, it can be done about 680,000 times a second. I do not think this reopening is a bottleneck for any sensible scenario, even on low-powered embedded Linux machines.
I'm writing a program to read from a pipe and I want to know what is the correct way of handling the return values. According to the read man page,
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.
I'm worried about the case where it may read only half of the data. Also, what is the correct way to handle the case when the return value is zero?
Here is my sample code.
struct day
{
int date;
int month;
};
while(1)
{
ret = select(maxfd+1, &read_fd, NULL, &exc_fd,NULL);
if(ret < 0)
{
perror("select");
continue;
}
if(FD_ISSET(pipefd[0], &read_fd))
{
struct day new_data;
if((ret = read(pipefd[0], &new_data, sizeof(struct day)))!= sizeof(struct day))
{
if(ret < 0)
{
perror("read from pipe");
continue;
}
else if(ret == 0)
{
/*how to handle?*/
}
else
/* truncated read. How to handle?*/
}
}
...
}
I believe read() cannot read more data than the size specified. please correct me if I'm wrong.
Please help me with the handling of return value of read.
When you read you request for a given amount of data, but nothing can guarantee you that there is as much available data to read as you requested. For example, you may encounter the end of file, or the writer part didn't write too much data in your pipe. So read returns you what was effectively read, aka the number of bytes read is returned (zero indicates end of file).
If read returns a strictly positive number, it's clear.
If read returns 0, then that means end of file. For a regular file that means that you are currently at the end of the file. For a pipe this means that the pipe is empty and that no single byte will ever be written to. For pipes that means that you already read all data and that there is no more writer on the other end (so that no more byte will be written), so you can then close the now unuseful pipe.
If read returns -1 this means that an error happened, and you must consult errno variable to determine the cause of the trouble.
So, a general schema could be something like:
n = read(descriptor,buffer,size);
if (n==0) { // EOF
close(descriptor);
} else if (n==-1) { // error
switch(errno) { // consult documentations for possible errors
case EAGAIN: // blahblah
}
} else { // available data
// exploit data from buffer[0] to buffer[n-1] (included)
}
If read returns 0, then your process has read all of the data that will come from that file descriptor. Take it out of read_fd and if it was the maxfd reset maxfd to the newest max. Depending on what your process is doing, you might have other cleanup to do as well. If you get a short read, then either process the data you've received or discard it or store it until you get all the data and can process it.
It's hard to give more specific answers to a very general question.
Assuming a pipe,
int pipe_fd[2];
pipe(pipe_fd);
We fork, and expect that one process will write into the pipe at an arbitrary time. In one of the processes, we want to be able to check the contents of the pipe without blocking.
i.e. While a typical read will block if nothing is present and the write end remains open. I want to go do other stuff and potentially even read a bit at a time, do some stuff, and then check back to see if there's more, a la:
close(pipe_fd[1]);
while(1){
if(/**Check pipe contents**/){
int present_chars = 0;
while( read(pipe_fd[0],&buffer[present_chars],1) != 0)
++present_chars;
//do something
}
else
//do something else
}
Your logic is wrong in that read will not return 0 when it runs out of characters; instead, it will block until it receives more, unless you put the file in non-blocking mode, but then it will return -1 and set errno to EWOULDBLOCK or EAGAIN rather than returning 0. The only time read can ever return 0 is when the size argument was 0 or end-of-file has been reached. And, for pipes, end-of-file means the writing end of the pipe has been closed; end-of-file status does not occur just because there's not any input available yet.
With that said, the simplest way to check is:
if (poll(&(struct pollfd){ .fd = fd, .events = POLLIN }, 1, 0)==1) {
/* data available */
}
but unless you're using nonblocking mode, you'll need to make this check before every single read operation. Passing a larger buffer to read rather than doing it a byte-at-a-time would eliminate most of the cost of checking.
You can check if there is data to be read with the read() function. From read(3):
When attempting to read from an empty pipe or FIFO:
* If some process has the pipe open for writing and
O_NONBLOCK is set, read() shall return -1 and set
errno to [EAGAIN].
* If some process has the pipe open for writing and
O_NONBLOCK is clear, read() shall block the calling
thread until some data is written or the pipe is
closed by all processes that had the pipe open for
writing.
The read() function shall fail if:
EAGAIN or EWOULDBLOCK
The file descriptor is for a socket, is marked
O_NONBLOCK, and no data is waiting to be received.
So if you set O_NONBLOCK, you will be able to tell if something is to be read on the pipe, by simply calling read().
As a reminder, from open(3):
SYNOPSIS
int open(const char *path, int oflag, ... );
DESCRIPTION
Values for oflag are constructed by a
bitwise-inclusive OR of flags from the following
list, defined in <fcntl.h>. Applications shall
specify exactly one of the first three values
(file access modes) below in the value of oflag:
O_NONBLOCK [...]
I hope it helps.
R..'s answer is good however poll returns the number of file descriptor structs that have flags set in "revents". This will be 1 if you can read from fd but will also be 1 if any of the error flags are set. This means R..'s answer will say the pipe is readable if it ever enters an error state. A more robust check could be something like this:
bool canReadFromPipe(){
//file descriptor struct to check if POLLIN bit will be set
//fd is the file descriptor of the pipe
struct pollfd fds{ .fd = fd, .events = POLLIN };
//poll with no wait time
int res = poll(&fds, 1, 0);
//if res < 0 then an error occurred with poll
//POLLERR is set for some other errors
//POLLNVAL is set if the pipe is closed
if(res < 0||fds.revents&(POLLERR|POLLNVAL))
{
//an error occurred, check errno
}
return fds.revents&POLLIN;
}
I don't know why I'm having a hard time finding this, but I'm looking at some linux code where we're using select() waiting on a file descriptor to report it's ready. From the man page of select:
select() and pselect() allow a program to monitor multiple file descriptors,
waiting until one or more of the file descriptors become "ready" for some
class of I/O operation
So, that's great... I call select on some descriptor, give it some time out value and start to wait for the indication to go. How does the file descriptor (or owner of the descriptor) report that it's "ready" such that the select() statement returns?
It reports that it's ready by returning.
select waits for events that are typically outside your program's control. In essence, by calling select, your program says "I have nothing to do until ..., please suspend my process".
The condition you specify is a set of events, any of which will wake you up.
For example, if you are downloading something, your loop would have to wait on new data to arrive, a timeout to occur if the transfer is stuck, or the user to interrupt, which is precisely what select does.
When you have multiple downloads, data arriving on any of the connections triggers activity in your program (you need to write the data to disk), so you'd give a list of all download connections to select in the list of file descriptors to watch for "read".
When you upload data to somewhere at the same time, you again use select to see whether the connection currently accepts data. If the other side is on dialup, it will acknowledge data only slowly, so your local send buffer is always full, and any attempt to write more data would block until buffer space is available, or fail. By passing the file descriptor we are sending to to select as a "write" descriptor, we get notified as soon as buffer space is available for sending.
The general idea is that your program becomes event-driven, i.e. it reacts to external events from a common message loop rather than performing sequential operations. You tell the kernel "this is the set of events for which I want to do something", and the kernel gives you a set of events that have occured. It is fairly common for two events occuring simultaneously; for example, a TCP acknowledge was included in a data packet, this can make the same fd both readable (data is available) and writeable (acknowledged data has been removed from send buffer), so you should be prepared to handle all of the events before calling select again.
One of the finer points is that select basically gives you a promise that one invocation of read or write will not block, without making any guarantee about the call itself. For example, if one byte of buffer space is available, you can attempt to write 10 bytes, and the kernel will come back and say "I have written 1 byte", so you should be prepared to handle this case as well. A typical approach is to have a buffer "data to be written to this fd", and as long as it is non-empty, the fd is added to the write set, and the "writeable" event is handled by attempting to write all the data currently in the buffer. If the buffer is empty afterwards, fine, if not, just wait on "writeable" again.
The "exceptional" set is seldom used -- it is used for protocols that have out-of-band data where it is possible for the data transfer to block, while other data needs to go through. If your program cannot currently accept data from a "readable" file descriptor (for example, you are downloading, and the disk is full), you do not want to include the descriptor in the "readable" set, because you cannot handle the event and select would immediately return if invoked again. If the receiver includes the fd in the "exceptional" set, and the sender asks its IP stack to send a packet with "urgent" data, the receiver is then woken up, and can decide to discard the unhandled data and resynchronize with the sender. The telnet protocol uses this, for example, for Ctrl-C handling. Unless you are designing a protocol that requires such a feature, you can easily leave this out with no harm.
Obligatory code example:
#include <sys/types.h>
#include <sys/select.h>
#include <unistd.h>
#include <stdbool.h>
static inline int max(int lhs, int rhs) {
if(lhs > rhs)
return lhs;
else
return rhs;
}
void copy(int from, int to) {
char buffer[10];
int readp = 0;
int writep = 0;
bool eof = false;
for(;;) {
fd_set readfds, writefds;
FD_ZERO(&readfds);
FD_ZERO(&writefds);
int ravail, wavail;
if(readp < writep) {
ravail = writep - readp - 1;
wavail = sizeof buffer - writep;
}
else {
ravail = sizeof buffer - readp;
wavail = readp - writep;
}
if(!eof && ravail)
FD_SET(from, &readfds);
if(wavail)
FD_SET(to, &writefds);
else if(eof)
break;
int rc = select(max(from,to)+1, &readfds, &writefds, NULL, NULL);
if(rc == -1)
break;
if(FD_ISSET(from, &readfds))
{
ssize_t nread = read(from, &buffer[readp], ravail);
if(nread < 1)
eof = true;
readp = readp + nread;
}
if(FD_ISSET(to, &writefds))
{
ssize_t nwritten = write(to, &buffer[writep], wavail);
if(nwritten < 1)
break;
writep = writep + nwritten;
}
if(readp == sizeof buffer && writep != 0)
readp = 0;
if(writep == sizeof buffer)
writep = 0;
}
}
We attempt to read if we have buffer space available and there was no end-of-file or error on the read side, and we attempt to write if we have data in the buffer; if end-of-file is reached and the buffer is empty, then we are done.
This code will behave clearly suboptimal (it's example code), but you should be able to see that it is acceptable for the kernel to do less than we asked for both on reads and writes, in which case we just go back and say "whenever you're ready", and that we never read or write without asking whether it will block.
From the same man page:
On exit, the sets are modified in place to indicate which file descriptors actually changed status.
So use FD_ISSET() on the sets passed to select to determine which FDs have become ready.
I am building a client/server model but using sockets, using named pipes, with mkfifo().
A client writes output into the name pipe, and I read the input in my server using:
while ((n = read(fd_in, &newChar, 1)) == 1) { /* ... */ }
I am reading one character at a time, until I encounter the two characters: <'CR'><'LF'>. I would like to make my code in such a way that if a client does not terminate with <'CR'><'LF'> after some time maybe, I can discard it and proceed to another client, otherwise the next client will have to wait, maybe infinitely.
Is there a way please to terminate the execution of read()? If it has not returned in 2 seconds, I could say interrupt read and discard the previously read characters, and start reading again please?
Thank you for your help,
Jary
#include <stdbool.h>
#include <poll.h>
do {
ssize_t ret;
struct pollfd ps = {.fd = fd_in, .events = POLLIN};
if (poll(&ps, 1, 2000) < 0)
break; /* kick client */
ret = read(in_fd, ...);
if (ret != 1)
break;
/* process read data */
} while (true);
This checks for whether there is data to be read; if there is not within 2000 msec, do whatever it is you want (e.g. disconnect).
Try passing the O_NONBLOCK flag when you open the read-end of the FIFO. That should change the behavior so that read returns right away even if the number of requested characters is not in the pipe.
To handle multiple clients simultaneously, you should set the file descriptors non-blocking with fcntl(), and then use select() or poll() to block until input appears on at least one of them.