Pipe chaining in my own shell implementation - c

I am currently writing my own shell implementation in C. I understood the principle behind piping and redirecting the fds. However, some specific behavior with pipes has attracted my attention:
cat | ls (or any command that does not read from stdin as final element of the pipe).
In that case, what happens in the shell is that ls executes and cat asks for a single line before exiting (resulting from a SIGPIPE I guess). I have tried to follow this tutorial to better understand the principle behind multiple pipes: http://web.cse.ohio-state.edu/~mamrak.1/CIS762/pipes_lab_notes.html
Below is some code I have written to try to replicate the behavior I am looking for:
char *cmd1[] = {"/bin/cat", NULL};
char *cmd2[] = {"/bin/ls", NULL};
int pdes[2];
pid_t child;
if (!(child = fork()))
{
pipe(pdes);
if (!fork())
{
close(pdes[0]);
dup2(pdes[1], STDOUT_FILENO);
/* cat command gets executed here */
execvp(cmd1[0], cmd1);
}
else
{
close(pdes[1]);
dup2(pdes[0], STDIN_FILENO);
/* ls command gets executed here */
execvp(cmd2[0], cmd2);
}
}
wait(NULL);
I am aware of the security flaws of that implementation but this is just for testing. The problem with that code as I understand it is that whenever ls gets executed, it just exits and then cat runs in the background somehow (and in my case fail because it tries to read during the prompt of zsh as my program exits). I cannot find a solution to make it work like it should be. Because if I wait for the commands one by one, such commands as cat /dev/random | head -c 10 would run forever...
If anyone has a solution for this issue or at least some guidance it would be greatly appreciated.

After consideration of comments from #thatotherguy here is the solution I found as implemented in my code. Please bear in mind that pipe and fork calls should be checked for errors but this version is meant to be as simple as possible. Extra exit calls are also necessary for some of my built-in commands.
void exec_pipe(t_ast *tree, t_sh *sh)
{
int pdes[2];
int status;
pid_t child_right;
pid_t child_left;
pipe(pdes);
if (!(child_left = fork()))
{
close(pdes[READ_END]);
dup2(pdes[WRITE_END], STDOUT_FILENO);
/* Execute command to the left of the tree */
exit(execute_cmd(tree->left, sh));
}
if (!(child_right = fork()))
{
close(pdes[WRITE_END]);
dup2(pdes[READ_END], STDIN_FILENO);
/* Recursive call or execution of last command */
if (tree->right->type == PIPE_NODE)
exec_pipe(tree->right, sh);
else
exit(execute_cmd(tree->right, sh));
}
/* Should not forget to close both ends of the pipe */
close(pdes[WRITE_END]);
close(pdes[READ_END]);
wait(NULL);
waitpid(child_right, &status, 0);
exit(get_status(status));
}
I was confused with the original link I posted and the different ways to handle chained pipes. From the link to the POSIX documented posted below my original question (http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_09_02) it appears that:
If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.
Both behavior are therefore accepted: waiting for last command, or waiting for all of them. I chose to implement the second behavior to stick to what bash/zsh would do.

Related

How to correctly close unused pipes?

I'm implementing a simplified shell which supports pipe.
A part of my code shown below runs fine, but I'm not sure why it works.
main.cpp
#include <iostream>
#include <string>
#include <queue>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "include/command.h"
using namespace std;
int main()
{
string rawCommand;
IndividualCommand tempCommand = {};
int pipeFD[2] = {PIPE_IN, PIPE_OUT};
int firstPipeRead, firstPipeWrite, secondPipeRead, secondPipeWrite;
while (true)
{
cout << "% ";
getline(cin, rawCommand);
if (rawCommand == "exit")
break;
Command *command = new Command(rawCommand);
deque<IndividualCommand> commandQueue = command->parse();
delete command;
while (!commandQueue.empty())
{
tempCommand = commandQueue.front();
commandQueue.pop_front();
firstPipeRead = secondPipeRead;
firstPipeWrite = secondPipeWrite;
if (tempCommand.outputStream == PIPE_OUT)
{
pipe(pipeFD);
secondPipeRead = pipeFD[0];
secondPipeWrite = pipeFD[1];
}
pid_t child_pid;
child_pid = fork();
int status;
// child process
if (child_pid == 0)
{
if (tempCommand.redirectToFile != "")
{
int fd = open(tempCommand.redirectToFile.c_str(), O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
dup2(fd, STDOUT_FILENO);
close(fd);
}
if (tempCommand.inputStream == PIPE_IN)
{
close(firstPipeWrite);
dup2(firstPipeRead, STDIN_FILENO);
close(firstPipeRead);
}
if (tempCommand.outputStream == PIPE_OUT)
{
close(secondPipeRead);
dup2(secondPipeWrite, STDOUT_FILENO);
close(secondPipeWrite);
}
if (tempCommand.argument != "")
execl(tempCommand.executable.c_str(), tempCommand.executable.c_str(), tempCommand.argument.c_str(), NULL);
else
execl(tempCommand.executable.c_str(), tempCommand.executable.c_str(), NULL);
}
else
{
close(secondPipeWrite);
if (commandQueue.empty())
waitpid(child_pid, &status, 0);
}
}
}
return 0;
}
command.h
#ifndef COMMAND_H
#define COMMAND_H
#include <string>
#include <queue>
#include <sstream>
#include <unistd.h>
using namespace std;
#define PIPE_IN 0x100000
#define PIPE_OUT 0x100001
struct IndividualCommand
{
string executable = "";
string argument = "";
string redirectToFile = "";
int inputStream = STDIN_FILENO;
int outputStream = STDOUT_FILENO;
int errorStream = STDERR_FILENO;
};
class Command
{
private:
string rawCommand, tempString;
queue<string> splittedCommand;
deque<IndividualCommand> commandQueue;
stringstream commandStream;
IndividualCommand tempCommand;
bool isExecutableName;
public:
Command(string rawCommand);
deque<IndividualCommand> parse();
};
#endif
command.cpp
#include "include/command.h"
Command::Command(string rawCommand)
{
this->rawCommand = rawCommand;
isExecutableName = true;
}
deque<IndividualCommand> Command::parse()
{
commandStream << rawCommand;
while (!commandStream.eof())
{
commandStream >> tempString;
splittedCommand.push(tempString);
}
while (!splittedCommand.empty())
{
tempString = splittedCommand.front();
splittedCommand.pop();
if (isExecutableName)
{
tempCommand.executable = tempString;
isExecutableName = false;
if (!commandQueue.empty() && commandQueue.back().outputStream == PIPE_OUT)
tempCommand.inputStream = PIPE_IN;
}
else
{
// normal pipe
if (tempString == "|")
{
tempCommand.outputStream = PIPE_OUT;
isExecutableName = true;
commandQueue.push_back(tempCommand);
tempCommand = {};
}
// redirect to file
else if (tempString == ">")
{
tempCommand.redirectToFile = splittedCommand.front();
splittedCommand.pop();
}
// argv
else
tempCommand.argument = tempString;
}
if (splittedCommand.empty())
{
commandQueue.push_back(tempCommand);
tempCommand = {};
}
}
return commandQueue;
}
So basically the communication is established between two child processes, not between child and parent. (I'm using those first and second pipes to avoid overwriting FDs with consecutive calls to pipe() when facing something like "ls | cat |cat").
The shell originally got stuck because the write end was not closed, and thus the read end got blocked. I've tried closing everything in both the child processes, but nothing changed.
My question is why close(secondPipeWrite); in the parent process solved everything? Does it mean that it is the write end of the pipe that really matters, and we don't have to care about whether the read end is closed explicitly?
Moreover, why I don't need to close anything in the child process and it still works?
Accidents will happen! Things will sometimes seem to work when there is no good reason for them to do so reliably. A multi-stage pipeline is not guaranteed to work if you do not close all the unused pipe descriptors properly, even though it happens to work for you. You aren't closing enough file descriptors in the child processes, in particular. You should close all the unused ends of all the pipes.
Here's a 'Rule of Thumb' I've included in other answers.
Rule of thumb: If you
dup2()
one end of a pipe to standard input or standard output, close both of the
original file descriptors returned by
pipe()
as soon as possible.
In particular, you should close them before using any of the
exec*()
family of functions.
The rule also applies if you duplicate the descriptors with either
dup()
or
fcntl()
with F_DUPFD or F_DUPFD_CLOEXEC.
If the parent process will not communicate with any of its children via
the pipe, it must ensure that it closes both ends of the pipe early
enough (before waiting, for example) so that its children can receive
EOF indications on read (or get SIGPIPE signals or write errors on
write), rather than blocking indefinitely.
Even if the parent uses the pipe without using dup2(), it should
normally close at least one end of the pipe — it is extremely rare for
a program to read and write on both ends of a single pipe.
Note that the O_CLOEXEC option to
open(),
and the FD_CLOEXEC and F_DUPFD_CLOEXEC options to fcntl() can also factor
into this discussion.
If you use
posix_spawn()
and its extensive family of support functions (21 functions in total),
you will need to review how to close file descriptors in the spawned process
(posix_spawn_file_actions_addclose(),
etc.).
Note that using dup2(a, b) is safer than using close(b); dup(a);
for a variety of reasons.
One is that if you want to force the file descriptor to a larger than
usual number, dup2() is the only sensible way to do that.
Another is that if a is the same as b (e.g. both 0), then dup2()
handles it correctly (it doesn't close b before duplicating a)
whereas the separate close() and dup() fails horribly.
This is an unlikely, but not impossible, circumstance.
Note that if the wrong process keeps a pipe descriptor open, it can prevent processes from detecting EOF. If the last process in a pipeline has the write end of a pipe open where a process (possibly itself) is reading until EOF on the read end of that pipe, the process will never get EOF.
Reviewing the C++ code
On the whole, your code was good. My default compilation options picked two problems with close(firstPipeWrite) and close(firstPipeRead) operating on uninitialized variables; they were treated as errors because I compile with:
c++ -O3 -g -std=c++11 -Wall -Wextra -Werror -c -o main.o main.cpp
But that was all — which is remarkably good work.
However, those errors also point to where your problem is.
Let's suppose you have a command input which requires two pipes (P1 and P2) and three processes (or commands, C1, C2, C3), such as:
who | grep -v root | sort
You want the commands set up as follows:
C1: who — creates P1; standard input = stdin, standard output = P1[W]
C2: grep — creates P2; standard input = P1[R], standard output = P2[W]
C3: sort — creates no pipe; standard input = P2[R], standard output = stdout
The PN[R] notation means the read descriptor of pipe N, etc.
A more elaborate pipeline, such as who | awk '{print $1}' | sort | uniq -c | sort -n, with 5 commands and 4 pipes is similar: it simply has more processes CN (with N = 2, 3, 4) which create PN and run with standard input coming from P(N-1)[R] and standard output going to PN[W].
A two-command pipeline has just one pipe, of course, and the structure:
C1 — creates P1; standard input = stdin, standard output = P1[W]
C2 — creates no pipe; standard input = P1[R], standard output = stdout
And a one-command (degenerate) pipeline has zero pipes, of course, and the structure:
C1 — creates no pipe; standard input = stdin, standard output = stdout
Note that you need to know whether the command you're processing is first, last, or in the middle of the pipeline — the plumbing work to be done for each is different. Also, if you have a multi-command pipeline (three or more commands), you can close the older pipes after a while; they won't be needed again. So as you're processing C3, both ends of P1 can be closed permanently; they won't be referenced again. You need the input pipe and the output pipe for the current process; any older pipes can be closed by the process coordinating the plumbing.
You need to decide which process is coordinating the plumbing. The easiest way in some respects is to have the original (parent) shell process launch all the sub-processes, left-to-right — which is what you're doing — but it is by no means the only way.
With the shell process launching the child processes, it is crucial that the shell eventually close all the descriptors of all the pipes it opened, so that the child processes can detect EOF. This must be done before waiting for any of the children. Indeed, all the processes in the pipeline must be launched before the parent can afford to wait for any of them — those processes must run concurrently, in general, as otherwise, the pipes in the middle may fill up, blocking the entire pipeline.
I'm going to point you at C Minishell — Adding Pipelines as a question with an answer showing how to do it. It is not the only way of doing it, and I'm not convinced it is the best way to do it, but it does work.
Sorting this out in your code is left as an exercise — I need to get some work done now. But this should give you strong pointers in the right direction.
Note that since your parent shell creates all the sub-processes, the waitpid() code is not ideal. You will have zombie processes accumulating. You'll need to think about a loop which collects any dead children, possibly with WNOHANG as part of the third argument so that when there are no zombies, the shell can continue. This becomes even more important when you run processes in background pipelines, etc.

Execute pipes recursively in Abstract syntax tree

I've managed to make an abstract syntax tree for my minishell, the things is
when I tried to execute the piped commands I got stuck.
The first pipe execute and output the result to the stdout 1, while the second one grep filename either stuck or not executed at all.
I tried different approches and I got different result, yet none of theme works
I would appreciate any help.
This how my AST looks like.
ls -la | cat -e | grep filename
t_node *pipe_execution(t_node *node, t_list *blt, t_line *line, int std[2])
{
int pp[2];
if (node)
{
if (node->kind == NODE_PIPE)
{
if (node->and_or_command->left)
{
pipe(pp);
std[1] = pp[1];
pipe_execution(node->and_or_command->left, blt, line, std);
close(pp[1]);
}
if (node->and_or_command->right)
{
std[0] = pp[0];
std[1] = 1;
dprintf(2, "right std %d\n", std[1]);
pipe_execution(node->and_or_command->right, blt, line, std);
close(std[0]);
}
} else if (node->kind == NODE_SIMPLE_COMMAND)
{
dprintf(2, "====%s=== and stdin %d stdout %d\n", node->simple_command->head->name, std[0], std[1]);
execute_shell(blt, line->env, node, std);
}
}
return (node);
}
int execute_shell(t_list *blt, t_list *env, t_node *node, int std[2])
{
...
return (my_fork(path, env, cmds, std));
}
my implementation of fork process.
int my_fork(char *path, t_list *env, char **cmds, int std[2])
{
pid_t child;
char **env_tab;
int status;
status = 0;
env_tab = env_to_tab(env);
child = fork();
if (child > 0)
waitpid(child, &status, 0);
else if (child == 0)
{
dup2(std[0], 0);
dup2(std[1], 1);
execve(path, cmds, env_tab);
}
return (status);
}
I hope this code make some sense.
Pipes require concurrent execution
The problem, as far as I can tell from the code snippets you provided, is that my_fork() is blocking. So when you execute a process, your shell stops and wait for that process to finish, before starting the next one. If you do something simple, like:
/bin/echo Hello | cat
Then the pipe's internal buffer is big enough to store the whole input string Hello. Once the /bin/echo process finishes, you execute cat, which can then read the buffered data from the pipe. However, once it gets more complicated, or when the first process would send a lot more data to the pipe, its internal buffer will get full, and then it will block.
The solution is to defer calling waitpid() on the processes you fork until you have spawned all the processes that are part of the command line.
Create all required pipes before starting processes
Your function pipe_execution() assumes that there is only a single pipe; it starts the first process with filedescriptor 0 as its input, and it starts the second process with filedescriptor 1 as its output. However, if you have multiple pipes on a single command line, like in ls -la | cat -e | grep filename, then the output of the cat -e process need to go into the second pipe, not to standard output.
You need to create the second pipe before starting the right-hand command of the first pipe. It's probably simplest to just create all the pipes before starting any of the commands. You could do this by defining multiple phases:
Create pipes
Start commands
Wait for all commands to finish
You can traverse the abstract syntax tree you built multiple times, each time executing one of the phases.

Fork never enters child's process

I'm writing a code that mimics a shell behavior, specifically & and |.
My function receives user commands, and checks if there's an & at the end, then the child process should run in the background and the parent should not wait for it to finish and continue executing commands.
Also it's supposed to check if there's a | in the input array and run two child processes while piping their stdin and stdout.
I have implemented the behavior for &, but whenever I compile and run my code, I only get the printf sentence from the parent's process.
I would like to hear ideas how to fix this, in addition I would appreciate any suggestions regarding the implementation of | (pipes) and how to prevent zombies.
int process_arglist(int count, char** arglist) {
int pid = fork();
printf("%d", pid);
switch (pid) {
case -1:
fprintf(stderr, "ERROR: fork failed\n");
return 1;
break;
case 0: // Son's proccess
printf("I got to son");
//check last arglist argument
if (strcmp(arglist[count - 1], "&") == 0) {
setpgid(0, 0);
arglist[count - 1] = NULL;
if (execvp(*arglist, arglist) < 0) { //execute the command
fprintf(stderr, "ERROR: execvp failed\n");
exit(1);
}
} else { //There's no & at the end, look for pipes
int i = 0;
while (i < count) {
if (strcmp(arglist[i], "|") == 0) {
int pid2 = fork();
if (pid2 < 0) {
//fork failed, handle error
}
if (pid2 == 0) { // Son's proccess
} else { //Parent's code
}
}
}
}
break;
//in case no & and no |, call execvp
default: //Parent's code
printf("I go to parent");
return 1;
break;
}
return 0;
}
The output is always "I go to parent"
I assume your code is for Linux or some other POSIX system. Read some good book on Linux programming (perhaps the old Advanced Linux Programming, freely downloadable, or something newer).
stdio(3) is buffered, and stdout and printf is often (but not always) line-buffered. Buffering happens for efficiency reasons (calling write(2) very often, e.g. once per output byte, is very slow; you should prefer doing write-s on chunks of several kilobytes).
BTW you'll better handle failure of system calls (see intro(2) and syscalls(2)) by using errno(3) thru perror(3) (or strerror(3) on errno). You (and the user of your shell) needs to be informed of the failure reason (and your current code don't show it).
I recommend to often end your printf format control strings with \n (this works when stdout is line-buffered) or to call fflush(3) at appropriate places.
As a rule of thumb, I suggest doing fflush(NULL); before every call to fork(2).
The behavior you observe is consistent with the hypothesis that some printf-ed data is staying in buffers (e.g. of stdout).
You could use strace(1) on your program (or on other ones, e.g. some existing shell process) to understand what system calls are done.
You should compile with all warnings and debug info (e.g. gcc -Wall -Wextra -g with GCC), improve your code to get no warnings, and use the debugger gdb (with care, it can be used on forking processes).
I'm writing a code that mimics a shell behavior
You probably are coding some shell. Then study for inspiration the source code of existing free software shells (most -probably all- Linux shells are free software).
I would appreciate any suggestions regarding the implementation of | (pipes) and how to prevent zombies.
Explaining all that requires a lot of space (several chapters of a book, or perhaps, an entire book) and don't fit here or on any other forum. So read a good Linux or POSIX programming book. Regarding pipes, read pipe(7) (it should be created with pipe(2) before the fork). Regarding avoiding zombie processes, you need to carefully call waitpid(2) or some similar call.

fork / pipe / close in a recursive function

In order to realize a shell command interpretor, I try to execute pipes.
To do it, I use a recursive function in wich I use the pipe function and some redirections with dup2.
Here is my code :
void test_recurs(pid_t pid, char **ae)
{
char *const arg[2] = {"/bin/ls", NULL};
char *const arg2[3] = {"/bin/wc", NULL};
static int limit = 0;
int check;
int fd[2];
if (limit > 5)
return ;
if (pipe(fd) == -1)
{
printf("pipe failed\n");
return ;
}
pid = fork();
if(pid != 0)
{
printf("père %d\n",getpid());
close(fd[0]);
dup2(fd[1], 1);
close(fd[1]);
if ((execve("/bin/ls", arg, ae)) == -1)
exit(125);
dprintf(2, "execution ls\n");
wait(&check);
}
else
{
printf("fils %d\n", getpid());
close(fd[1]);
dup2(fd[0], 0);
close(fd[0]);
if ((execve("/bin/wc", arg2, ae)) == -1)
printf("echec execve\n");;
dprintf(2, "limit[%d]\n", limit);
limit++;
test_recurs(pid, ae);
}
}
The problem is it only execute "ls | wc" one time and then wait on the standard input. I know that the problem may come from the pipes (and the redirections).
It's a bit unclear how you are trying to use the function you present, but here are some notable points about it:
It's poor form to rely on a static variable to limit recursion depth because it's not thread-safe and because you need to do extra work to manage it (for example, to ensure that any changes are backed out when the function returns). Use a function parameter instead.
As has been observed in comments, the exec-family functions return only on failure. Although you acknowledge that, I'm not sure you appreciate the consequences, for both branches of your fork contain code that will never be executed as a result. The recursive call in particular is dead and will never be executed.
Moreover, the process in which the function is called performs an execve() call itself. The reason that function does not return is that it replaces the process image with that of the new process. That means that function test_recurs() also does not return.
Just as shell ordinarily must fork / exec to launch a single external command, it ordinarily must fork / exec for each command in a pipeline. If it fails to do so then afterward it is no longer running -- whatever it exec'ed without forking runs instead.
The problem is it only execute "ls | wc" one time and then wait on the standard input.
Certainly it does not recurse, because the recursive call is in a section of dead code. I suspect you are mistaken in your claim that it afterward waits on standard input, because the process that calls that function execs /bin/ls, which does not read from standard input. When the ls exits, however, leaving you with neither shell nor ls, what you then see might seem to be a wait on stdin.

forks and pipes implementation linux compiler

I have the following code taken from the “Pipes” section of Beej’s Guide to Unix IPC.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pfds[2];
pipe(pfds);
if (!fork()) {
close(1); /* close normal stdout */
dup(pfds[1]); /* make stdout same as pfds[1] */
close(pfds[0]); /* we don't need this */
execlp("ls", "ls", NULL);
} else {
close(0); /* close normal stdin */
dup(pfds[0]); /* make stdin same as pfds[0] */
close(pfds[1]); /* we don't need this */
execlp("wc", "wc", "-l", NULL);
}
return 0;
}
This code allows the user to see how many files are in a specific directory. How can I edit this code to implement the longer pipeline cat /etc/passwd | cut –f1 –d: | sort? Does anyone have any idea how to do this because I am completely stuck. Any help would be appreciated.
Feels like homework, so I'll just give you some pointers:
The longer pipeline has two pipes, so you'll need to call pipe() twice. (I'd also check pipe's return value whilst I was at it.)
There are three processes, which means two forks. Again, check fork()'s return value properly: it's tri-state: parent, child or failure, and your program should test all three cases.
If you call pipe() twice up front, think carefully about which file descriptors (i.e. which ends of pipes) are which in each process, and hence which ones to close before invoking execlp(). I'd draw a picture.
I'd prefer dup2() to dup(), since you're explicitly setting the target file descriptor, and so it makes sense to specify it in the call. Also avoids silly bugs.
dup and execlp can fail, so I'd check their return values too...
You need (depending on the length of the command list) some pipes. But: at maximum you need not more than two pipe-pair-fds for a process in the middle, for the first and the last you need one pipe-pair-fds. Be really sure to close the pipe-fds which are not needed - if not, the child processes might not get an EOF and never finish.
And (as user3392484 stated): check all system calls for error conditions and report them to the caller. This will make life much easier.
I implemented something like this during the last days, maybe you want to have a look there: pipexec.c.
Kind regards - Andreas

Resources