I'm working on a C shell and am having trouble with getting an arbitrary amount of pipes to work. When I run the shell, it hangs on any piping. For some reason, when I do ls -la | sort, it hangs on the sort until I enter stuff and hit Ctrl+D. I know it has something to do with a pipe not closing, but the print statements show that pipes 3,4,5 all get closed in both the parent and child. I've been at this for a few hours and don't know why this doesn't work. Any help would be much appreciated.
Original Code:
char *current_command;
current_command = strtok_r(cmdline_copy, "|", &cmdline_copy);
char *commands[100][MAX_ARGS]; //Max 100 piped commands with each having MAX_ARGS arguments
int i = 0;
while (current_command != NULL) { //Go through each command and add it to the array
char *copy = malloc(strlen(current_command)*sizeof(char)); //Copy of curretn command
strcpy(copy, current_command);
char *args_t[MAX_ARGS];
int nargs_t = get_args(copy, args_t);
memcpy(commands[i], args_t, sizeof(args_t)*nargs_t); //Copy the command and it's arguments to the 2d array
i++;
current_command = strtok_r(NULL, "|\n", &cmdline_copy); //Use reentrant version of strtok to prevent fighting with get_args function
}
int fd[2*(i-1)]; //Set up the pipes i.e fd[0,1] is first pipe, fd[1,2] second pipe, etc.
for (int j = 0; j < i*2; j+=2) {
pipe(fd+j);
}
//Here is where we do the commands
for (int j = 0; j < i; j++) {
pid = fork(); //Fork
if (pid == 0) { //Child process
if (j == 0) { //First process
printf("Child Closed %d\n", fd[0]);
close(fd[0]);
dup2(fd[1], fileno(stdout));
}
else if (j == i -1) { //Last process
dup2(fd[j], fileno(stdin));
printf("Child closed %d\n", fd[j]);
printf("Child closed %d\n", fd[j+1]);
close(fd[j+1]);
close(fd[j]);
}
else { //Middle processes
dup2(fd[j], fileno(stdin));
dup2(fd[j+1], fileno(stdout));
printf("Child closed %d\n", fd[j]);
close(fd[j]);
}
execvp(commands[j][0], commands[j]);
}
else if (pid > 0) { //Parent
printf("Parent closed %d\n", fd[j]);
close(fd[j]);
printf("Parent closed %d\n", fd[j+1]);
close(fd[j+1]);
waitpid(pid, NULL, 0); //Wait for the process
}
else {
perror("Error with fork");
exit(1);
}
}
Final Code:
char *current_command;
current_command = strtok_r(cmdline_copy, "|", &cmdline_copy);
char *commands[100][MAX_ARGS]; //Max 100 piped commands with each having MAX_ARGS arguments
int command_count = 0;
while (current_command != NULL) { //Go through each command and add it to the array
char *copy = malloc(strlen(current_command)*sizeof(char)); //Copy of curretn command because get_args uses strtok
strcpy(copy, current_command);
char *args_t[MAX_ARGS];
int nargs_t = get_args(copy, args_t);
memcpy(commands[command_count], args_t, sizeof(args_t)*nargs_t); //Copy the command and it's arguments to the 2d array
command_count++;
current_command = strtok_r(NULL, "|\n", &cmdline_copy); //Use reentrant version of strtok to prevent fighting with get_args function
}
int fd[command_count*2-1];
pid_t pids[command_count];
for (int j = 0; j < command_count*2; j+=2) { //Open up a pair of pipes for every command
pipe(fd+j);
}
for (int j = 0; j < command_count; j++) {
pids[j] = fork();
if (pids[j] == 0) { //Child process
if (j == 0) { //Duplicate only stdout pipe for first pipe
dup2(fd[1], fileno(stdout));
}
else if (j == (command_count-1)) { //Duplicate only stdin for last pipe
up2(fd[2*(command_count-1)-2], fileno(stdin));
}
else { //Duplicate both stdin and stdout
dup2(fd[2*(j-1)], fileno(stdin));
dup2(fd[2*j+1], fileno(stdout));
}
for (int k = 0; k < j*2; k++) { //Close all fds
close(fd[k]);
}
execvp(commands[j][0], commands[j]); //Exec the command
}
else if (pids[j] < 0) {
perror("Error forking");
}
}
for (int k = 0; k < command_count*2; k++) { //Parent closes all fds
close(fd[k]);
}
waitpid(pids[command_count-1], NULL, 0); //Wait for only the last process;
You aren't closing enough file descriptors in the children (or, in this case, in the parent).
Rule of thumb: If you
dup2()
one end of a pipe to standard input or standard output, close both of the
original file descriptors returned by
pipe()
as soon as possible.
In particular, you should close them before using any of the
exec*()
family of functions.
The rule also applies if you duplicate the descriptors with either
dup()
or
fcntl()
with F_DUPFD
In your code, you create all the pipes before you fork any children; therefore, each child needs to close all the pipe file descriptors after duplicating the one or two that it is going to use for input or output.
The parent process must also close all the pipe descriptors.
Also, the parent should not wait for children to complete until after launching all the children. In general, children will block with full pipe buffers if you make them run sequentially. You also defeat the benefits of parallelism. Note, however, that the parent must keep the pipes open until it has launched all the children — it must not close them after it launches each child.
For your code, the outline operation should be:
Create N pipes
For each of N (or N+1) children:
Fork.
Child duplicates standard input and output pipes
Child closes all of the pipe file descriptors
Child executes process (and reports error and exits if it fails)
Parent records child PID.
Parent goes on to next iteration; no waiting, no closing.
Parent now closes N pipes.
Parent now waits for the appropriate children to die.
There are other ways of organizing this, of greater or lesser complexity. The alternatives typically avoid opening all the pipes up front, which reduces the number of pipes to be closed.
'Appropriate children' means there are various ways of deciding when a pipeline (sequence of commands connected by pipes) is 'done'.
One option is to wait for the last command in the sequence to exit. This has advantages — and is the traditional way to do it. Another advantage is that the parent process can launch the last child; the child can launch its predecessor in the pipeline, back to the first process in the pipeline. In this scenario, the parent never creates a pipe, so it doesn't have to close any pipes. It also only has one child to wait for; the other processes in the pipeline are descendents of the one child.
Another option is to wait for all the processes to die(1). This is more or less what Bash does. This allows Bash to know the exit status of each element of the pipeline; the alternative does not permit that — which is relevant to set -o pipefail and the PIPEFAIL array.
Can you help me understand why the dup2 statement for the middle pipes is dup2(fd[(2*j)+1], fileno(stdout)) and dup2(fd[2*(j-1)], fileno(stdin))? I got it off Google and it works, but I'm unsure why.
fileno(stdout) is 1.
fileno(stdin) is 0.
The read end of a pipe is file descriptor 0 (analogous to standard input).
The write end of a pipe is file descriptor 1 (analogous to standard output).
You have an array int fd[2*N]; for some value of N > 1, and you get a pair of file descriptors for each pipe.
For an integer k, fd[k*2+0] is the read descriptor of a pipe, and fd[k*2+1] is the read descriptor.
When j is neither 0 nor (N-1), you want it to read from the previous pipe and to write to its pipe:
fd[(2*j)+1] is the write descriptor of pipe j — which gets connected to stdout.
fd[2*(j-1)] is the read descriptor of pipe j-1 — which gets connected to stdin.
So, the two dup2() calls connect the the correct pipe file descriptors to standard input and standard output of process j in the pipeline.
(1)
There can be obscure scenarios where this leaves the parent hung indefinitely. I emphasize obscure; it requires something like a process that hangs around as a daemon without forking.
Related
I'm writing a shell in C and am trying to implement multiple pipes. I've done this by creating a two dimensional array with pipes and using a separate pipe everytime. All commands in between pipes are separated by a parsing function and put into a struct. Every command line in-between pipes gets it's own process. And for all commands in the middle I'm trying to read from the previous process and write to the next one. Somewhere here the problem starts. It works fine for one pipe, however when I trying more than one pipe I don't get any output and the program gets stuck. In GDB I get a failed message from the execvp after forking the second process. What can this be due to?
int create_pipe(int* fd)
{
int pipe_id = pipe(fd);
if (pipe_id == -1)
{
return -1;
}
return 0;
}
void write_pipe(int* fd)
{
close(fd[READ]);
if ((dup2(fd[WRITE], STDOUT_FILENO)) < -1)
{
fork_error();
}
close(fd[WRITE]);
}
void read_pipe(int *fd)
{
close(fd[WRITE]);
if (dup2(fd[READ], STDIN_FILENO) < 0)
{
fork_error();
}
close(fd[READ]);
}
void need_to_pipe (int i, int (*fd)[2])
{
if (commands[i].pos == first)
{
write_pipe(fd[i * 2]);
}
else if (commands[i].pos == last)
{
read_pipe(fd[(i-1) *2]);
}
else //if (commands[i].pos == middle)
{
dup2(fd[(i-1)*2][READ], STDIN_FILENO);
close(fd[(i-1)*2][READ]);
close(fd[(i-1)*2][WRITE]);
//close(fd[(i)*2][READ]);
//close(fd[(i)*2][WRITE]);
close(fd[(i)*2][READ]);
dup2(fd[i*2][WRITE], STDOUT_FILENO);
close(fd[(i)*2][WRITE]);
}
}
/**
* Fork a proccess for command with index i in the command pipeline. If needed,
* create a new pipe and update the in and out members for the command..
*/
void fork_cmd(int i, int (*fd)[2]) {
pid_t pid;
switch (pid = fork()) {
case -1:
fork_error();
case 0:
// Child process after a successful fork().
if (!(commands[i].pos == single))
{
need_to_pipe(i, fd);
}
// Execute the command in the contex of the child process.
if (execvp(commands[i].argv[0], commands[i].argv)<0)
{
fprintf(stderr, "command not found: %s\n",
commands[i].argv[0]);
exit(EXIT_FAILURE);
}
default:
// Parent process after a successful fork().
break;
}
}
/**
* Fork one child process for each command in the command pipeline.
*/
void fork_cmds(int n, int (*fd)[2])
{
for (int i = 0; i < n; i++)
{
fork_cmd(i, fd);
}
}
void wait_once ()
{
wait(NULL);
}
/**
* Make the parents wait for all the child processes.
*/
void wait_for_all_cmds(int n)
{
for (int i = 0; i < n; i++)
{
wait_once();
//wait for number of child processes.
}
}
int main() {
int n; // Number of commands in a command pipeline.
size_t size = 128; // Max size of a command line string.
char line[size];
while(true) {
// Buffer for a command line string.
printf(" >>> ");
get_line(line, size);
n = parse_cmds(line, commands);
int fd[(n-1)][2];
for(int i =0;i<n-1;i++)
{
int pipe_id = pipe(fd[i*2]);
if (pipe_id == -1)
{
return -1;
}
}
fork_cmds(n, fd);
for(int i =0;i<n-1;i++)
{
int *fdclose= fd[i*2];
close (fdclose[READ]);
close (fdclose[WRITE]);
}
wait_for_all_cmds(n);
}
exit(EXIT_SUCCESS);
}
You [probably] have too many processes keeping pipe ends open (that do not belong to the given child) because your loop opens all pipes before any forking.
This places an undue burden on each child because it has to close many pipe ends to prevent it from holding open a pipe end, preventing other children from seeing an EOF on their input pipes.
To see this, for debug purposes in your present code, the child could do (just before the exec* call) (e.g.):
fprintf(stderr,"child: %d\n",getpid());
fflush(stderr);
system("ls -l /proc/self/fd 1>&2");
Each child should only have three open streams on stdin, stdout, and stderr (e.g. 0, 1, 2).
I think you'd find that there are many extraneous/detrimental streams open on the various children.
You only need two pipe arrays (e.g.): int pipeinp[2]; int pipeout[2]; Initially, pipeinp is all -1.
Roughly ...
Parent should do a single pipe call at the top of fork_cmd [before the fork] to pipeout.
The child dups (and closes) the read end of pipeinp [if not -1] to stdin.
Child dups/closes the write end of pipeout to stdout.
It closes the read end of pipeout.
After that, the parent should copy pipeout to pipeinp and close the write end of pipeinp
This should be repeated for all pipe stages.
No pipe to pipeout should be done for the last command. And, the [last] child should not change stdout.
For a working example, see my answer: fd leak, custom Shell
I have parent process that creates two child processes. First child will write to pipe and second child will read from the pipe. After 5 seconds parent will terminate first child.(so its write end should be automatically closed, isn't it?). I need second child to terminate automatically, because it uses pipe and after first child is terminated, the pipe should be terminated too. My problem: how can i force child 2 to die immediately when child 1 i killed?(i don't need child 2 to print something after child 1 is dead, even if he still has any information in the pipe buffer to read). Should i use pipe2 instead of simple pipe?
void do_close(int fd){
if(close(fd) != 0){
perror("close");
exit(2);
}
}
void signalhandler(int signum){
fprintf(stderr, "f1 terminated!\n");
exit(1);
}
int main(){
pid_t f1 = -1, f2 = -1;
int pipefd[2];
if(pipe(pipefd) == -1){
perror("pipe");
exit(2);
}
f1 = fork();
if(f1 > 0){
f2 = fork();
}
if(f1 > 0 && f2 > 0){
do_close(pipefd[0]);
do_close(pipefd[1]);
sleep(5);
kill(f1, SIGTERM);
waitpid(f1, NULL, 0);
waitpid(f2, NULL, 0);
}
else if(f1 == 0){
signal(SIGTERM, signalhandler);
do_close(pipefd[0]);
while(1){
fflush(stdout);
write(pipefd[1], "Hello world!\n", 13);
sleep(1);
}
}
else if(f2 == 0){
do_close(pipefd[1]);
char *buf = (char*)malloc(13);
while(read(pipefd[0], buf, 13) > 0){
for(int i = 0; i < 13; i++){
printf("%c", buf[i]);
}
sleep(3);
}
}
return 0;
}
"Terminated" isn't the usual terminology for a pipe, and it might be causing a slight misunderstanding. The termination of the writing process isn't immediately "felt" by the reader if there is still data in the buffer.
To summarize the program, you have one process that writes to a pipe at a rate of 1 line per second for 5 seconds, then dies. Another process reads from the pipe at a rate of 1 line every 3 seconds until EOF or error, then exits.
The lines are small and fixed-size so there's no chance of reading an incomplete line.
Nothing in the program should cause an error on the pipe, so the second child process will read until EOF.
EOF on a pipe occurs when 2 conditions are met: there are no writers, and the buffer is empty. The death of the first child process accomplishes the first condition. The second condition is not immediately true, because at the 5 second mark, there have been 5 lines written to the pipe, and only 2 of them have been read (one at the start, and one after 3 seconds).
The second child process keeps reading, pulling in the remaining lines, and eventually exits after about 5*3=15 seconds.
(The timing isn't infinitely precise, so you aren't guaranteed to get exactly 5 lines written to the pipe. When I ran it I got 6.)
So I have some code that I am working on where I'm going to fork N children to do something with N different files and gather some info. The parent is only reading, and the children are only writing. If one child finishes before the other, I'd like to start processing that data in the parent, while the others still run.
Here I am making N pipes
int pipefds[nFiles*2];
int k;
for(k = 0; k < nFiles; k++)
{
if(pipe(pipefds[k*2]))
{
/* pipe failed */
}
}
I then fork N processes, and want them to do something with the file and send it to the parent
int i;
for(i = 0; i < nFiles; i++)
{
pid = fork();
if(pid < 0)
{
/* error */
}
else if(pid == 0)
{
/*child */
close(pipefds[i*2]); //I think I want to close the Read End of each child's pipe
getData(file[i]); // do something with the file this process is handling
write(fd[i*2 +1], someData, sizeof(someData); // write something to the write end of the child's pipe
exit(0);
}
else
{
/*parent*/
if(i = nFiles -1) //do I have to make this condition so that I start once all processes have been started??
{
int j;
for(j = 0; j < nFiles; j++)
{
close(pipefds[j*2+1]); //close all the parents write ends
}
/*Here I want to pick the pipe that finished writing first and do something with it */
}
}
Do I have to wait for the last iteration of the for loop to start doing parent stuff (cause I want all the processes to start before I do anything)? Also How to I find the pipe in pipefds which is has finished writing so I can start handling it while the others run? Thanks!
Probably the simplest solution is to create all the children first. Then run a separate loop around poll or select. When you get a read hit on one of the pipe ends, read the data into a buffer. If that read gets an indication that the other end of the pipe has closed, you can begin processing the data from that child. (Don't forget to remove that pipe from the set you are polling or selecting on.) When all children's pipes are closed, you're done.
A simple multi-process program in linux.
Input some numbers like ./findPrime 10 20 30.
The program will create 3 child processes to find out all primes between 2-10, 10-20, 20-30.
Once a child process find a prime, it will write "2 is prime" through a pipe and send to the parent. Parent will print it on the screen.
THE PROBLEM here is that, I use a while loop to write message into the pipe and use another while loop on the parent side to receive the message, but with the code below, it only display the first message, so I am wondering what`s going on, how can i keep reading from that pipe? Did I miss someting? Thanks very much!
char readBuffer[100];
char outBuffer[15];
int pids[argc];
int fd[2];
pipe(fd);
for(i = 0; i < argc; i++)
{
if( i == 0)
bottom = 2;
else
bottom = args[i - 1];
top = args[i];
pids[i] = fork();
if(pids[i] == 0)
{
printf("Child %d: bottom=%d, top=%d\n", getpid(), bottom, top);
close(fd[0]);
j = bottom;
while(j <= top)
{
int res = findPrime(j);
if(res == 1)
{
sprintf(outBuffer, "%d is prime", j);
write(fd[1], outBuffer, (strlen(outBuffer)+1));
}
j++;
}
exit(0x47);
}
else if(pids[i] < 0)
{
fprintf(stderr, "fork failed! errno = %i\n", errno);
break;
}
else
{
close(fd[1]);
while((nbytes = read(fd[0], readBuffer, sizeof(readBuffer))) > 0 )
printf("%s\n", readBuffer);
int status = 0;
pid = waitpid(pids[i], &status, 0);
if(pid >= 0)
printf("Child %d exited cleanly\n", pid);
}
}
And these child process should run in the order that they were created, like when Process 1 is done, then Process 2 will run, and process 3 will after 2.
I also want the parent process display the message immediately when it receives one.
Parent/children share their file descriptors (as they presently are) at the time of the fork. Your immediate problem is that you close fd[1] in the parent. When the first child ends the fact that the process ends means that fd[1] will be automatically closed in the child. As the OS no longer has any valid references to the file descriptor it becomes invalid. So your pipe writes fail in all subsequent children.
So just don't close fd[1] in the parent.
But then you have other problems too. One that jumps out is that if one of your child processes doesn't find a prime it will never write to the pipe. The parent, however, will block forever waiting for something to read that is never going to arrive. By not closing fd[1] in the parent you won't see EOF - i.e. read() == 0 in the parent. So one solution is to pass a "done" message back via the pipe and have the parent parse that stuff out.
A better solution yet is to consider a redesign. Count the number of processes you are going to need by parsing the command line arguments right at the beginning of the program. Then dynamically allocate the space for the number of pipe descriptors you are going to need and give each process its own pair. That could avoid everything altogether and is a more standard way of doing things.
I'm very beginner with linux however I've managed to do my own shell. It's time to add pipelines in there. (That's what, the homework says). Could anyone explain me a little bit more how to do that? I know that in theory, it should work like that.
unsigned char* child_results; //buffer to save results from child processes
for (int i = 0; i < 2; i++) {
pid = fork();
if (pid == 0) {
//if it's the first process to launch, close the read end of the pipe
//otherwise read what the parent writes to the pipe and then close the
//read end of the pipe
//call execvp()
}
else {
//if you've launched the first process, close the write end of the pipe
//otherwise write the contents of child_result to the child process
//and then close the write end of the pipe
//read from the child's pipe what it processed
//save the results in the child_results buffer
wait(NULL); //wait for the child to finish
}
}
However, I can't get it working. I'm doing that whole day and still, nothing. I do understand the idea, but I can't get it work. Could some1 help me? Here's the code of my pipelines part:
for (int i = 0; i <= pipeline_count; i++) {
int pdesc[2];
// creating pipe
pipe(pdesc);
int b = fork();
// child
if (b == 0) {
// 1st pipeline
if (i == 0) {
//<?>
}
// last pipeline
if (i == pipeline_count) {
//<?>
}
// inside pipeline
if (i > 0 && i < pipeline_count) {
//<?>
}
execvp(array[0], array);
}
else {
// parent
//<?>
wait(NULL);
}
}
and here's an example of a shell command
ls -al | tr a-z A-Z
Thanks
You must close the input stream on the child and duplicate with dup the pipe for that channel. The parent does the same with the other side of the pipe. Something like this:
b = fork();
if (b == 0) {
/* Close stdin, and duplicate the input side of the pipe over stdin */
dup2(0, pdesc[0]);
execlp(...);
}
else {
/* Close stdout, and duplicate the output side of the pipe over stdout */
dup2(1, pdesc[1]);
execlp(...);
}
...
I've shown you how to do it on a two processes case, but you can get the general idea and adapt it to other situations.
Hope that helped!