I am trying to use named pipes in C and am running into some difficulty. In terms of anonymous pipes, I just create the pipe with the r/w descriptors and then close the opposite end every time I want to do a read or write. This is easy to do since I can just open() and close() the other end every time.
With named pipes, I am a bit confused, I found the instruction mkfifo() which creates the named pipe but don't understand how to read and write to it properly.
Thanks
After the pipe has been created with mkfifo() (which could have been done at any point in the past - named pipes exist in the filesystem until they're unlinked), the reading side opens it using open("/path/to/pipe", O_RDONLY) and the writing side opens it with open("/path/to/pipe", O_WRONLY).
After that it can be used just like an anonymous pipe.
There's nothing much to it. Use mkfifo to make the pipe and then have your processes read and write to it like any file. It's not C specific either. You can do this:
mkfifo testfifo
cat testfifo
And then in another window
echo "hello, world" > testfifo
I think you should just use the pipes, cause they handle the data transmission among the different processes no matter the time each proccess takes
Related
I'm looking to run X amount of processes that I'm able to iterate through in order to run programs where there's a master and 'slaves' that take the masters orders and return a string.
I'm writing in C. I'm wondering how I'd be able to set up pipes and forking between there processes to read from standard in and out. I'm currently able to have them work one at a time until the are killed, but I would like to simply read one line then move to the next process. Any help?
Generally, the common strategy for this sort of programming is to set up an event loop.
You would set up pipes and connect them to stdin and stdout of your program.
You don't specify what language you're using.
In C, you would create two pipes, one for reading, and one for writing.
Then you would fork. After the fork, in the child, you close stdin and stdout, and you use the dup2 system call to copy one end of the pipe filedescriptors to the child.
In the parent, you connect each process to an event loop, which lets you know when one of your FDs is ready for reading or writing.
Take a look at these class notes for discussion of using pipes and dup2.
Here's an introduction to libevent, one of the common event loops for C.
For other languages you'll do something similar. For example for Python, take a look at the asyncio support for subprocesses.
So, I was given this one line script:
echo test | cat | grep test
Could you please explain to me how exactly that would work given the following system calls: pipe(), fork(), exec() and dup2()?
I am looking for an general overview here and mainly the sequence of operations.
What I know so far is that the shell will fork using fork() and the script's code will replace the shell's one by using the exec(). But what about pipe and dup2? How do they fall in place?
Thanks in advance.
First consider a simpler example, such as:
echo test | cat
What we want is to execute echo in a separate process, arranging for its standard output to be diverted into the standard input of the process executing cat. Ideally this diversion, once setup, would require no further intervention by the shell — the shell would just calmly wait for both processes to exit.
The mechanism to achieve that is called the "pipe". It is an interprocess communication device implemented in the kernel and exported to the user-space. Once created by a Unix program, a pipe has the appearance of a pair of file descriptors with the peculiar property that, if you write into one of them, you can read the same data from the other. This is not very useful within the same process, but keep in mind that file descriptors, including but not limited to pipes, are inherited across fork() and even accross exec(). This makes pipe an easy to set up and reasonably efficient IPC mechanism.
The shell creates the pipe, and now owns a set of file descriptors belonging to the pipe, one for reading and one for writing. These file descriptors are inherited by both forked subprocesses. Now only if echo were writing to the pipe's write-end descriptor instead of to its actual standard output, and if cat were reading from the pipe's read-end descriptor instead of from its standard input, everything would work. But they don't, and this is where dup2 comes into play.
dup2 duplicates a file descriptor as another file descriptor, automatically closing the new descriptor beforehand. For example, dup2(1, 15) will close file descriptor 1 (by convention used for the standard output), and reopen it as a copy of file descriptor 15 — meaning that writing to the standard output will in fact be equivalent to writing to file descriptor 15. The same applies to reading: dup2(0, 8) will make reading from file descriptor 0 (the standard input) equivalent to reading from file descriptor 8. If we proceed to close the original file descriptor, the open file (or a pipe) will have been effectively moved from the original descriptor to the new one, much like sci-fi teleports that work by first duplicating a piece of matter at a remote location and then disintegrating the original.
If you're still following the theory, the order of operations performed by the shell should now be clear:
The shell creates a pipe and then fork two processes, both of which will inherit the pipe file descriptors, r and w.
In the subprocess about to execute echo, the shell calls dup2(1, w); close(w) before exec in order to redirect the standard output to the write end of the pipe.
In the subprocess about to execute cat, the shell calls dup2(0, r); close(r) in order to redirect the standard input to the read end of the pipe.
After forking, the main shell process must itself close both ends of the pipe. One reason is to free up resources associated with the pipe once subprocesses exit. The other is to allow cat to actually terminate — a pipe's reader will receive EOF only after all copies of the write end of the pipe are closed. In steps above, we did close the child's redundant copy of the write end, the file descriptor 15, right after its duplication to 1. But the file descriptor 15 must also exist in the parent, because it was inherited under that number, and can only be closed by the parent. Failing to do that leaves cat's standard input never reporting EOF, and its cat process hanging as a consequence.
This mechanism is easily generalized it to three or more processes connected by pipes. In case of three processes, the pipes need to arrange that echo's output writes to cat's input, and cat's output writes to grep's input. This requires two calls to pipe(), three calls to fork(), four calls to dup2() and close (one for echo and grep and two for cat), three calls to exec(), and four additional calls to close() (two for each pipe).
I'm trying to write to stdin and read from stdout ( and stderr ) from an external program, without changing the code.
I've tried using named pipes, but stdout doesn't show until the program is terminated and stdin only works on the first input( then cin is null ).
i've tried using /proc/[pid]/fd but that only writes and reads from the terminal and not the program.
i've tried writing a character device file for this and it worked, but only one program at a time ( this needs to work for multiple programs at a time ).
at this point, to my knowledge, I could write the driver that worked to multiplex the io across multiple programs but I don't think that's the "right" solution.
the main purpose of this is to view a feed of a program through a web interface. I'm sure there has to be someway to do this. is there anything I haven't tried that's been done before?
The typical way of doing this is:
Create anonymous pipes (not named pipes) with the pipe(2) system call for the new process's standard streams
Call fork(2) to spawn the child process
close(2) the appropriate ends of the pipes in both the parent and the child (e.g. for the stdin pipe, close the read end in the parent and close the write end in the child; vice-versa for the stdout and stderr pipes)
Use dup2(2) in the child to copy the pipe file descriptors onto file descriptors 0, 1, and 2, and then close(2) the remaining old descriptors
exec(3) the external application in the child process
In the parent process, simultaneously write to the child's stdin pipe and read from the child's stdout and stderr pipes. However, depending on how the child behaves, this can easily lead to deadlock if you're not careful. One way to avoid deadlock is to spawn separate threads to handle each of the 3 streams; another way is to use the select(2) system call to wait until one of the streams can be read from/written to without blocking, and then process that stream.
Even if you do this all correctly, you may still not see your program's output right away. This is typically due to buffering stdout. Normally, when stdout is going to a terminal, it's line-buffered—it gets flushed after every newline gets written. But when stdout is a pipe (or anything else that's not a terminal, like a file or a socket), it's fully buffered, and it only gets written to when the program has outputted a full buffer's worth of data (e.g. 4 KB).
Many programs have command line options to change their buffering behavior. For example, grep(1) has the --line-buffered flag to force it to line-buffer its output even when stdout isn't a terminal. If your external program has such an option, you should probably use it. If not, it's still possible to change the buffering behavior, but you have to use some sneaky tricks—see this question and this question for how to do that.
I have the following Bash script:
cat | command1 | command2 | command3
The commands never change.
For performance reasons, I want to replace it with a small C-program, that runs the commands and creates and assings the pipes accordingly.
Is there a way to do that in C?
As others said, you probably won't get a significant performance benefit.
It's reasonable to assume that the commands you run take most of the time, not the shell script gluing them together, so even if the glue becomes faster, it will change almost nothing.
Having said that, if you want to do it, you should use the fork(), pipe, dup2() and exec() functions.
fork will give you multiple processes.
pipe will give you a pair of file descriptors - what you write into one, you can read from the other.
dup2 can be used to change file descriptor numbers. You can take one side of a pipe and make it become file descriptor 1 (stdout) in one process, and the other side you'll make file descriptor 0 (stdin) in another (don't forget to close the normal stdin, stdout first).
exec (or one of its variants) will be used to execute the programs.
There are lots of details to fill in. Have fun.
Here is an example that does pretty much this.
There is no performance benefit for the processing itself, just a couple of milliseconds in initialization. Obviously we don't know the context in which you're doing this, but just using dash instead of bash would probably have gotten you 80% of those milliseconds from a single character change in your #!
What are the portable options if one needs to export open file descriptors to child processes created using exec family of library functions?
Thank you.
EDIT. I know that child processes inherit open descriptors. But how they use those descriptors without knowing their values? Should I implement some sort of IPC in order to pass descriptors to the child process? For example, if the parent creates a pipe, how can an execed child process know read/write ends of the pipe?
Simply don't set the O_CLOEXEC open(2) flag or its corresponding (and standard) FD_CLOEXEC fcntl(2) flag on the descriptor -- it'll be passed across an exec*() by default.
Update
Thanks for the clarification, that does change things a little bit.
There are several possibilities:
Use command line arguments: GnuPG in gpg(1) provides command line switches --status-fd, --logger-fd, --attribute-fd, --passphrase-fd, --command-fd for each file descriptor that it expects to receive. If there are several kinds of data to submit or retrieve, this lets each file descriptor focus on one type of data and reduces the need for parsing more complicated output.
Just work with files and accept filenames as parameters; when you call the program, pass it file names such as /dev/fd/5, and arrange for the input to be on fd 5 before calling the program:
cat /dev/fd/5 5</etc/passwd
Follow conventions: supply 0 to the child as the read end of a pipe, 1 to the write end of a pipe, and let it work as a normal pipeline "filter" command. This is definitely the best approach if all the input can be reasonably sent through a single file descriptor -- not always desirable.
Use an environment variable to indicate the file / socket / fd:
SSH_AUTH_SOCK=/tmp/ssh-ZriaCoWL2248/agent.2248
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-CsUrnHGmKa,guid=e213e2...
This is nice to pass the file information through many child programs.