Use a FILE * in epoll - c

I am opening another process in my application using popen and parsing its output. I want to get notified as soon as the program has made any output. Currently all stuff in my program uses epoll for such actions. Now popen does return me a FILE * instead of a fd. Is it save to use the fileno function and put the resulting fd into epoll? If not, is there another way? I do not want the process to block, thats why I want the notifications.

Yes it is safe, as long as you don't do any extra thing with the file descriptor without being careful it seems to be safe. One more thing, is that you probably want to mark the file descriptor as non-blocking too. So that you can read() without blocking and handle EBUSY or EAGAIN checking errno.

Related

What happens when I try to read from a pipe without writing to it?

With this operation I expected to get an error because I'm reading from nothing, but in fact the program seems to keep attempting to read until someone writes to it. If there are no writes it will be stuck in an indefinite loop trying to read and will not proceed.
What exactly happens behind the scene here, do the function kept looping, or is it waiting for a signal, or is something else going on? Is it still taking CPU resources?
Also, is it possible to make the program return an error code/print out something when trying to read without any writes? I don't really need to do it, just wondering if it's possible.
This is normal behavior. If nothing is available to read, the reading process will block until there is. It will not consume CPU time while blocking; the OS will put it to sleep until another process writes to the pipe.
Keep in mind that pipes were designed to be somewhat transparent; a simple filter-type program should not have to care whether the input is a file or a pipe. If every program that wanted to be able to read from a pipe (think grep) had to include special handling to wait until the writer was ready, it would be very tedious for those programmers. This behavior means that reading from pipes doesn't require doing anything special.
If you don't want to block if no data is available, you can set the O_NONBLOCK status flag on the file descriptor, either when you open(2) it, or with fcntl(fd, F_SETFL, ...). In this case, when no data is available, read(2) will return -1 and set errno to EAGAIN or EWOULDBLOCK. This means, of course, that every time you read from the file descriptor, you have to write code to handle such a case.
You can also use select(2) or poll(2) to wait until data is available, optionally with a timeout.
It is also possible to arrange it so that a signal arriving during the blocking will cause read(2) to return -1 and set errno to EINTR. This depends on system call restarting semantics and is a little bit complicated.

C: blocking read should return, if filedescriptor is deleted

I am reading in a blocked way from a device/filedescriptor.
It might happen, that in a different thread the device is closed and filedescriptor is deleted. Unfortunatly the read doesn't return or take notice and keeps blocking.
As a workaround I could do a while loop with select as a timeout. If a timeout happens, I can check the filedescriptor and in case it is gone not calling read but return.
I am wondering, if there is a better way in Linux-C ?
The code you are describing has an inherent race condition - if another thread could be a in blocking read() on a file descriptor when you close() that file descriptor, the other thread could just as well be just about to call read() instead.
You can't call close() unless you know that all other threads are no longer in a position to be using that file descriptor at all.
The easiest way to handle cases like you describe is for one thread to be the 'owning' thread of each file descriptor, that is responsible for closing the file descriptor. Other threads don't directly close it - instead they mark the file descriptor as "to be closed" in some shared data structure and wake up the owning thread.
You can make it possible to wake the owning thread by having it not block in read() but instead block in select() or poll() with another file descriptor - usually a pipe - in the set as well as the target file descriptor. The thread is woken by writing to the other end of that pipe.
Once a file descriptor is closed by other thread, it's not easy to verify it happened. What if other thread re-opened a file and got the same
file descriptor? On a successful close() call, you can't access the file descriptor again and it'll be undefined. On a failed close() call, POSIX leaves the state of the file descriptor unspecified.
The select() option suffers from the same as described above.
Your problem is not really anything different to any other data race issue in a multi-threaded program. I suggest you re-write the
code so that threads don't access the file descriptor without synchronization. Or, avoid multiple threads reading from the same file descriptor if that's possible.

How to create blocking file descriptor in unix?

I would like to create blocking and non-blocking file in Unix's C. First, blocking:
fd = open("file.txt", O_CREAT | O_WRONLY | O_EXCL);
is that right? Shouldnt I add some mode options, like 0666 for example?
How about non-blocking file? I have no idea for this.
I would like to achieve something like:
when I open it to write in it, and it's opened for writing, it's ok; if not it blocks.
when I open it to read from it, and it's opened for reading, it's ok; if not it blocks.
File descriptors are blocking or non-blocking; files are not. Add O_NBLOCK to the options in the open() call if you want a non-blocking file descriptor.
Note that opening a FIFO for reading or writing will block unless there's a process with the FIFO open for the other operation, or you specify O_NBLOCK. If you open it for read and write, the open() is non-blocking (will return promptly); I/O operations are still controlled by whether you set O_NBLOCK or not.
The updated question is not clear. However, if you're looking for 'exclusive access to the file' (so that no-one else has it open), then neither O_EXCL nor O_NBLOCK is the answer. O_EXCL affects what happens when you create the file; the create will fail if the file already exists. O_NBLOCK affects whether a read() operation will block when there's no data available to read. If you read the POSIX open() description, there is nothing there that allows you to request 'exclusive access' to a file.
To answer the question about file mode: if you include O_CREAT, you need the third argument to open(). If you omit O_CREAT, you don't need the third argument to open(). It is a varargs function:
int open(const char *filename, int options, ...);
I don't know what you are calling a blocking file (blocking IO in Unix means that the IO operations wait for the data to be available or for a sure failure, they are opposed to non-blocking IO which returns immediately if there is no available data).
You always need to specify a mode when opening with O_CREAT.
The open you show will fails if the file already exists (when fixed for the above point).
Unix has no standard way to lock file for exclusive access excepted that. There are advisory locks (but all programs must respect the protocol). Some have mandatory lock extension. The received wisdom is not to rely on either kind of locking when accessing network file system.
Shouldn't I add some mode options?
You should, if the file is write-only and to be created if nonexistent. In this case, open() expects a third argument as well, so omitting it results in undefined behavior.
Edit:
The updated question is even more confusing...
when I open it to write in it, and it's opened for writing, it's ok; if not it blocks.
Why would you need that? See, if you try to write to a file/file descriptor not opened for writing, write() will return -1 and you can check the error code stored in errno. Tell us what you're trying to achieve by this bizarre thing you want instead of overcomplicating and messing up your code.
(Remarks in parentheses:
I would like to create blocking and non-blocking file
What's that?
in unix's C
Again, there's no such thing. There is the C language, which is platform-independent.)

Unix: What happens when a read file descriptor closes while calling select()

Say that I call select() on a FD_SET containing a bunch of read file descriptors. What happens if during the select() call, one of the file descriptor closes? Assuming that some sort of error occurs, then is it my responsibility to find and remove the closed file descriptor from the set?
I don't believe this is specified anywhere; some systems may immediately return from select while others may continue blocking. Note that the only way this can happen is in a multi-threaded process (otherwise, the close cannot happen during select; even if it happened from a signal handler, select would have already been interrupted by the signal). As such, this situation arising probably indicates you have bigger issues to worry about. If one of the file descriptors you're polling can be closed during select, the bigger issue is that the same file descriptor might be reassigned to a newly opened file (e.g. one opened in another unrelated thread) immediately after the close, and the thread that's polling might then wrongly perform IO on the new file "belonging to" a different thread.
If you have a data object that consists of a set of file descriptors that will be polled with select in a multithreaded program, you almost surely need to be using some sort of synchronization primitive to control access to that set, and adding or removing file descriptors should require a lock that's mutually exclusive with the possibility that select (or any IO on the members) is in progress.
Of course in a multi-threaded program, it may be better not to use select at all and instead let blocking IO in multiple threads achieve the desired result without complicated locking logic.
The select() system call takes three fd_set parameters: Send, Receive, Exception. To check, if an error occurs on a reading file descriptor include it in the read (receive) and in the error (exceprion) set - seeing it in the exception set on return from select() means, an exception has occurred on that socket, giving you the chance to find out what.
In general network sockets with any sort of exception will no longer be fit to send and receive.
Even if you've read all the sent data, a closed socket is always regarded as ready to read. Select will unblock, signaling that socket to be available.

What does select(2) do if you close(2) a file descriptor in a separate thread?

What is the behavior of the select(2) function when a file descriptor it is watching for reading is closed by another thread?
From some cursory testing, it does return right away. I suspect the outcome is either that (a) it still continues to wait for data, but if you actually tried to read from it you'd get EBADF (possibly -- there's a potential race) or (b) that it pretends as though the file descriptor were never passed in. If the latter case is true, passing in a single fd with no timeout would cause a deadlock if it were closed.
From some additional investigation, it appears that both dwc and bothie are right.
bothie's answer to the question boils down to: it's undefined behavior. That doesn't mean that it's unpredictable necessarily, but that different OSes do it differently. It would appear that systems like Solaris and HP-UX return from select(2) in this case, but Linux does not based on this post to the linux-kernel mailing list from 2001.
The argument on the linux-kernel mailing list is essentially that it is undefined (and broken) behavior to rely upon. In Linux's case, calling close(2) on the file descriptor effectively decrements a reference count on it. Since there is a select(2) call also with a reference to it, the fd will remain open and waiting for input until the select(2) returns. This is basically dwc's answer. You will get an event on the file descriptor and then it'll be closed. Trying to read from it will result in a EBADF, assuming the fd hasn't been recycled. (A concern that MarkR made in his answer, although I think it's probably avoidable in most cases with proper synchronization.)
So thank you all for the help.
I would expect that it would behave as if the end-of-file had been reached, that's to say, it would return with the file descriptor shown as ready but any attempt to read it subsequently would return "bad file descriptor".
Having said that, doing that is very bad practice anyway, as you'd always have potential race conditions as another file descriptor with the same number could be opened by yet another thread immediately after the other 2nd closed it, then the selecting thread would end up waiting on the wrong one.
As soon as you close a file, its number becomes available for reuse, and may get reused by the next call to open(), socket() etc, even if by another thread. Therefore you really, really need to avoid this kind of thing.
The select system call is a way to wait for file desctriptors to change state while the programs doesn't have anything else to do. The main use is for server applications, which open a bunch of file descriptors and then wait for anything to do on them (accept new connections, read requests or send the responses). Those file descriptors will be opened in non-blocking io mode such that the server process won't hang in a syscall at any times.
This additionally means, there is no need for separate threads, because all the work, that could be done in the thread can be done prior to the select call as well. And if the work takes long, than it can be interrupted, select being called with timeout={0,0}, the file descriptors get handled and afterwards the work is being resumed.
Now, you close a file descriptor in another thread. Why do you have that extra thread at all, and why shall it close the file descriptor?
The POSIX standard doesn't provide any hints, what happens in this case, so what you're doing is UNDEFINED BEHAVIOR. Expect that the result will be very different between different operating systems and even between version of the same OS.
Regards, Bodo
It's a little confusing what you're asking...
Select() should return upon an "interesting" change. If the close() merely decremented the reference count and the file was still open for writing somewhere then there's no reason for select() to wake up.
If the other thread did close() on the only open descriptor then it gets more interesting, but I'd need to see a simple version of the code to see if something's really wrong.

Resources