I posted a question previously about using fork() and pipes in C
. I changed the design a little bit so that it reads a regular txt file and sorts the words in the file. So far, this is what I came up with:
for (i = 0; i < numberOfProcesses; ++i) {
// Create the pipe
if (pipe(fd[i]) < 0) {
perror("pipe error");
exit(1);
}
// fork the child
pids[i] = fork();
if (pids[i] < 0) {
perror("fork error");
} else if (pids[i] > 0) {
// Close reading end in parent
close(fd[i][0]);
} else {
// Close writing end in the child
close(fd[i][1]);
int k = 0;
char word[30];
// Read the word from the pipe
read(fd[i][0], word, sizeof(word));
printf("[%s]", word); <---- **This is for debugging purpose**
// TODO: Sort the lists
}
}
// Open the file, and feed the words to the processes
file_to_read = fopen(fileName, "rd");
char read_word[30];
child = 0;
while( !feof(file_to_read) ){
// Read each word and send it to the child
fscanf(file_to_read," %s",read_word);
write(fd[child][1], read_word, strlen(read_word));
++child;
if(child >= numberOfProcesses){
child = 0;
}
}
where numberOfProcesses is a command-line argument. So what it does is that it reads each word in the file and send it to a process. This however, does not work. When I print the word in the child process, it doesn't give me the correct output. Am I writing/reading the words correctly to/from the pipes?
Are the words being printed in the wrong order or interleaved? The thing is that when you write a word to a pipe, you are expecting the process handling that pipe to be scheduled immediately and to print the word. Then you expect the main process to run again and the next word to be written to the next pipe etc.
But that is not guaranteed to happen. Your main loop might write all of the words to all of the pipes before any of the other processes is scheduled. Those processes might not be scheduled in the order you expect. And the printf calls might interfere with each other so that their output becomes interleaved.
If you really want to do what you set out to, then Posix threads would be better. If you just wanted to learn something about using multiple processes, then I guess you have :-)
In the parent, you write strlen() bytes, which might be less than 30 bytes. In the child, however, you always try to read 30 bytes. You also must NUL-terminate the word, otherwise you might see garbage or a runaway string in your printf() statement.
In the child, you must either parse and split the input at word boundary or use stdio as #JonathanLeffler proposed. When you use stdio, you get all this buffering and word reading for free.
int n;
char word[31];
/* Read the word from the pipe */
n = read(fd[i][0], word, sizeof(word) - 1);
if (n == -1) {
perror("read");
/* do error handling */
} else {
word[n] = 0;
printf("[%s]", word);
}
Related
I'm working on a C shell and am having trouble with getting an arbitrary amount of pipes to work. When I run the shell, it hangs on any piping. For some reason, when I do ls -la | sort, it hangs on the sort until I enter stuff and hit Ctrl+D. I know it has something to do with a pipe not closing, but the print statements show that pipes 3,4,5 all get closed in both the parent and child. I've been at this for a few hours and don't know why this doesn't work. Any help would be much appreciated.
Original Code:
char *current_command;
current_command = strtok_r(cmdline_copy, "|", &cmdline_copy);
char *commands[100][MAX_ARGS]; //Max 100 piped commands with each having MAX_ARGS arguments
int i = 0;
while (current_command != NULL) { //Go through each command and add it to the array
char *copy = malloc(strlen(current_command)*sizeof(char)); //Copy of curretn command
strcpy(copy, current_command);
char *args_t[MAX_ARGS];
int nargs_t = get_args(copy, args_t);
memcpy(commands[i], args_t, sizeof(args_t)*nargs_t); //Copy the command and it's arguments to the 2d array
i++;
current_command = strtok_r(NULL, "|\n", &cmdline_copy); //Use reentrant version of strtok to prevent fighting with get_args function
}
int fd[2*(i-1)]; //Set up the pipes i.e fd[0,1] is first pipe, fd[1,2] second pipe, etc.
for (int j = 0; j < i*2; j+=2) {
pipe(fd+j);
}
//Here is where we do the commands
for (int j = 0; j < i; j++) {
pid = fork(); //Fork
if (pid == 0) { //Child process
if (j == 0) { //First process
printf("Child Closed %d\n", fd[0]);
close(fd[0]);
dup2(fd[1], fileno(stdout));
}
else if (j == i -1) { //Last process
dup2(fd[j], fileno(stdin));
printf("Child closed %d\n", fd[j]);
printf("Child closed %d\n", fd[j+1]);
close(fd[j+1]);
close(fd[j]);
}
else { //Middle processes
dup2(fd[j], fileno(stdin));
dup2(fd[j+1], fileno(stdout));
printf("Child closed %d\n", fd[j]);
close(fd[j]);
}
execvp(commands[j][0], commands[j]);
}
else if (pid > 0) { //Parent
printf("Parent closed %d\n", fd[j]);
close(fd[j]);
printf("Parent closed %d\n", fd[j+1]);
close(fd[j+1]);
waitpid(pid, NULL, 0); //Wait for the process
}
else {
perror("Error with fork");
exit(1);
}
}
Final Code:
char *current_command;
current_command = strtok_r(cmdline_copy, "|", &cmdline_copy);
char *commands[100][MAX_ARGS]; //Max 100 piped commands with each having MAX_ARGS arguments
int command_count = 0;
while (current_command != NULL) { //Go through each command and add it to the array
char *copy = malloc(strlen(current_command)*sizeof(char)); //Copy of curretn command because get_args uses strtok
strcpy(copy, current_command);
char *args_t[MAX_ARGS];
int nargs_t = get_args(copy, args_t);
memcpy(commands[command_count], args_t, sizeof(args_t)*nargs_t); //Copy the command and it's arguments to the 2d array
command_count++;
current_command = strtok_r(NULL, "|\n", &cmdline_copy); //Use reentrant version of strtok to prevent fighting with get_args function
}
int fd[command_count*2-1];
pid_t pids[command_count];
for (int j = 0; j < command_count*2; j+=2) { //Open up a pair of pipes for every command
pipe(fd+j);
}
for (int j = 0; j < command_count; j++) {
pids[j] = fork();
if (pids[j] == 0) { //Child process
if (j == 0) { //Duplicate only stdout pipe for first pipe
dup2(fd[1], fileno(stdout));
}
else if (j == (command_count-1)) { //Duplicate only stdin for last pipe
up2(fd[2*(command_count-1)-2], fileno(stdin));
}
else { //Duplicate both stdin and stdout
dup2(fd[2*(j-1)], fileno(stdin));
dup2(fd[2*j+1], fileno(stdout));
}
for (int k = 0; k < j*2; k++) { //Close all fds
close(fd[k]);
}
execvp(commands[j][0], commands[j]); //Exec the command
}
else if (pids[j] < 0) {
perror("Error forking");
}
}
for (int k = 0; k < command_count*2; k++) { //Parent closes all fds
close(fd[k]);
}
waitpid(pids[command_count-1], NULL, 0); //Wait for only the last process;
You aren't closing enough file descriptors in the children (or, in this case, in the parent).
Rule of thumb: If you
dup2()
one end of a pipe to standard input or standard output, close both of the
original file descriptors returned by
pipe()
as soon as possible.
In particular, you should close them before using any of the
exec*()
family of functions.
The rule also applies if you duplicate the descriptors with either
dup()
or
fcntl()
with F_DUPFD
In your code, you create all the pipes before you fork any children; therefore, each child needs to close all the pipe file descriptors after duplicating the one or two that it is going to use for input or output.
The parent process must also close all the pipe descriptors.
Also, the parent should not wait for children to complete until after launching all the children. In general, children will block with full pipe buffers if you make them run sequentially. You also defeat the benefits of parallelism. Note, however, that the parent must keep the pipes open until it has launched all the children — it must not close them after it launches each child.
For your code, the outline operation should be:
Create N pipes
For each of N (or N+1) children:
Fork.
Child duplicates standard input and output pipes
Child closes all of the pipe file descriptors
Child executes process (and reports error and exits if it fails)
Parent records child PID.
Parent goes on to next iteration; no waiting, no closing.
Parent now closes N pipes.
Parent now waits for the appropriate children to die.
There are other ways of organizing this, of greater or lesser complexity. The alternatives typically avoid opening all the pipes up front, which reduces the number of pipes to be closed.
'Appropriate children' means there are various ways of deciding when a pipeline (sequence of commands connected by pipes) is 'done'.
One option is to wait for the last command in the sequence to exit. This has advantages — and is the traditional way to do it. Another advantage is that the parent process can launch the last child; the child can launch its predecessor in the pipeline, back to the first process in the pipeline. In this scenario, the parent never creates a pipe, so it doesn't have to close any pipes. It also only has one child to wait for; the other processes in the pipeline are descendents of the one child.
Another option is to wait for all the processes to die(1). This is more or less what Bash does. This allows Bash to know the exit status of each element of the pipeline; the alternative does not permit that — which is relevant to set -o pipefail and the PIPEFAIL array.
Can you help me understand why the dup2 statement for the middle pipes is dup2(fd[(2*j)+1], fileno(stdout)) and dup2(fd[2*(j-1)], fileno(stdin))? I got it off Google and it works, but I'm unsure why.
fileno(stdout) is 1.
fileno(stdin) is 0.
The read end of a pipe is file descriptor 0 (analogous to standard input).
The write end of a pipe is file descriptor 1 (analogous to standard output).
You have an array int fd[2*N]; for some value of N > 1, and you get a pair of file descriptors for each pipe.
For an integer k, fd[k*2+0] is the read descriptor of a pipe, and fd[k*2+1] is the read descriptor.
When j is neither 0 nor (N-1), you want it to read from the previous pipe and to write to its pipe:
fd[(2*j)+1] is the write descriptor of pipe j — which gets connected to stdout.
fd[2*(j-1)] is the read descriptor of pipe j-1 — which gets connected to stdin.
So, the two dup2() calls connect the the correct pipe file descriptors to standard input and standard output of process j in the pipeline.
(1)
There can be obscure scenarios where this leaves the parent hung indefinitely. I emphasize obscure; it requires something like a process that hangs around as a daemon without forking.
So I have some code that I am working on where I'm going to fork N children to do something with N different files and gather some info. The parent is only reading, and the children are only writing. If one child finishes before the other, I'd like to start processing that data in the parent, while the others still run.
Here I am making N pipes
int pipefds[nFiles*2];
int k;
for(k = 0; k < nFiles; k++)
{
if(pipe(pipefds[k*2]))
{
/* pipe failed */
}
}
I then fork N processes, and want them to do something with the file and send it to the parent
int i;
for(i = 0; i < nFiles; i++)
{
pid = fork();
if(pid < 0)
{
/* error */
}
else if(pid == 0)
{
/*child */
close(pipefds[i*2]); //I think I want to close the Read End of each child's pipe
getData(file[i]); // do something with the file this process is handling
write(fd[i*2 +1], someData, sizeof(someData); // write something to the write end of the child's pipe
exit(0);
}
else
{
/*parent*/
if(i = nFiles -1) //do I have to make this condition so that I start once all processes have been started??
{
int j;
for(j = 0; j < nFiles; j++)
{
close(pipefds[j*2+1]); //close all the parents write ends
}
/*Here I want to pick the pipe that finished writing first and do something with it */
}
}
Do I have to wait for the last iteration of the for loop to start doing parent stuff (cause I want all the processes to start before I do anything)? Also How to I find the pipe in pipefds which is has finished writing so I can start handling it while the others run? Thanks!
Probably the simplest solution is to create all the children first. Then run a separate loop around poll or select. When you get a read hit on one of the pipe ends, read the data into a buffer. If that read gets an indication that the other end of the pipe has closed, you can begin processing the data from that child. (Don't forget to remove that pipe from the set you are polling or selecting on.) When all children's pipes are closed, you're done.
A simple multi-process program in linux.
Input some numbers like ./findPrime 10 20 30.
The program will create 3 child processes to find out all primes between 2-10, 10-20, 20-30.
Once a child process find a prime, it will write "2 is prime" through a pipe and send to the parent. Parent will print it on the screen.
THE PROBLEM here is that, I use a while loop to write message into the pipe and use another while loop on the parent side to receive the message, but with the code below, it only display the first message, so I am wondering what`s going on, how can i keep reading from that pipe? Did I miss someting? Thanks very much!
char readBuffer[100];
char outBuffer[15];
int pids[argc];
int fd[2];
pipe(fd);
for(i = 0; i < argc; i++)
{
if( i == 0)
bottom = 2;
else
bottom = args[i - 1];
top = args[i];
pids[i] = fork();
if(pids[i] == 0)
{
printf("Child %d: bottom=%d, top=%d\n", getpid(), bottom, top);
close(fd[0]);
j = bottom;
while(j <= top)
{
int res = findPrime(j);
if(res == 1)
{
sprintf(outBuffer, "%d is prime", j);
write(fd[1], outBuffer, (strlen(outBuffer)+1));
}
j++;
}
exit(0x47);
}
else if(pids[i] < 0)
{
fprintf(stderr, "fork failed! errno = %i\n", errno);
break;
}
else
{
close(fd[1]);
while((nbytes = read(fd[0], readBuffer, sizeof(readBuffer))) > 0 )
printf("%s\n", readBuffer);
int status = 0;
pid = waitpid(pids[i], &status, 0);
if(pid >= 0)
printf("Child %d exited cleanly\n", pid);
}
}
And these child process should run in the order that they were created, like when Process 1 is done, then Process 2 will run, and process 3 will after 2.
I also want the parent process display the message immediately when it receives one.
Parent/children share their file descriptors (as they presently are) at the time of the fork. Your immediate problem is that you close fd[1] in the parent. When the first child ends the fact that the process ends means that fd[1] will be automatically closed in the child. As the OS no longer has any valid references to the file descriptor it becomes invalid. So your pipe writes fail in all subsequent children.
So just don't close fd[1] in the parent.
But then you have other problems too. One that jumps out is that if one of your child processes doesn't find a prime it will never write to the pipe. The parent, however, will block forever waiting for something to read that is never going to arrive. By not closing fd[1] in the parent you won't see EOF - i.e. read() == 0 in the parent. So one solution is to pass a "done" message back via the pipe and have the parent parse that stuff out.
A better solution yet is to consider a redesign. Count the number of processes you are going to need by parsing the command line arguments right at the beginning of the program. Then dynamically allocate the space for the number of pipe descriptors you are going to need and give each process its own pair. That could avoid everything altogether and is a more standard way of doing things.
I want to preface this with the fact that I have no formal education in the use of pipes, so this is my first venture. Not to mention that I couldn't find any similar questions to my situation.
Note: This IS part of a larger project for a school assignment, so I am NOT asking for anyone to do this for me. I would just like some direction/helpful code segments. (I have tried to make this as generic as possible to avoid "cheater" remarks.)
I am trying to run a for-loop over int k elements in which a parent process spawns off k children with fork() and execl(), and then use a pipe() to send the output back to the parent. Here is some generic code that I am trying to use and the error/problem in which I encounter:
Note: helloworld= an executable compiled with GCC that produces printf("hello world\n");
int k = 10; //The number of children to make
int fd[2]; //Used for each pipe
int readFromMe[k]; //Holds the file IDs of each pipe fd[0]
int writeToMe[k]; //Holds the file IDs of each pipe fd[1]
int processID[k]; //Holds the list of child process IDs
//Create children
int i;
for(i = 0; i < k; i++)
{
if(pipe(fd) == -1)
{
printf("Error - Pipe error.\n");
exit(EXIT_FAILURE);
}
//Store the pipe ID
readFromMe[i] = fd[0];
writeToMe[i] = fd[1];
if((processID[i] = fork()) == -1)
{
fprintf(stderr, "fork failure");
exit(EXIT_FAILURE);
}
//If it is a child, change the STDOUT to the pipe-write file descriptor, and exec
if(processID[i] == 0)
{
dup2 (writeToMe[i], STDOUT_FILENO);
close(readFromMe[i]);
execl("./helloworld", (char *)0);
}
//If it is the parent, just close the unnecessary pipe-write descriptor and continue itterating
else
{
close(writeToMe[i]);
}
}
//Buffer for output
char output[100000];
//Read from each pipe and print out the result
for(i = 0; i < k; i++)
{
int r = read(readFromMe[i], &output, (sizeof(char) * 100000));
if(r > 0)
{
printf("result = %s\n", output);
}
close(readFromMe[i]);
}
I get no output from my program at all, so I am trying to figure out why this issue is occurring.
Probably unrelated, but you call execl wrong. The extra arguments after the program is what will the the argv array to the other programs main function. And as you know it always have one entry, the program name. So you need to call it like this:
execl("./helloworld", "helloworld", NULL);
More related to your problem, you should also check for errors, it might actually fail.
Try printing the value of 'r' in your printout function. I suspect the read is returning an error (perhaps EPIPE) that you're not seeing. Also, you example code is trying to printf 'c', not output like it looks like you meant.
I'm very beginner with linux however I've managed to do my own shell. It's time to add pipelines in there. (That's what, the homework says). Could anyone explain me a little bit more how to do that? I know that in theory, it should work like that.
unsigned char* child_results; //buffer to save results from child processes
for (int i = 0; i < 2; i++) {
pid = fork();
if (pid == 0) {
//if it's the first process to launch, close the read end of the pipe
//otherwise read what the parent writes to the pipe and then close the
//read end of the pipe
//call execvp()
}
else {
//if you've launched the first process, close the write end of the pipe
//otherwise write the contents of child_result to the child process
//and then close the write end of the pipe
//read from the child's pipe what it processed
//save the results in the child_results buffer
wait(NULL); //wait for the child to finish
}
}
However, I can't get it working. I'm doing that whole day and still, nothing. I do understand the idea, but I can't get it work. Could some1 help me? Here's the code of my pipelines part:
for (int i = 0; i <= pipeline_count; i++) {
int pdesc[2];
// creating pipe
pipe(pdesc);
int b = fork();
// child
if (b == 0) {
// 1st pipeline
if (i == 0) {
//<?>
}
// last pipeline
if (i == pipeline_count) {
//<?>
}
// inside pipeline
if (i > 0 && i < pipeline_count) {
//<?>
}
execvp(array[0], array);
}
else {
// parent
//<?>
wait(NULL);
}
}
and here's an example of a shell command
ls -al | tr a-z A-Z
Thanks
You must close the input stream on the child and duplicate with dup the pipe for that channel. The parent does the same with the other side of the pipe. Something like this:
b = fork();
if (b == 0) {
/* Close stdin, and duplicate the input side of the pipe over stdin */
dup2(0, pdesc[0]);
execlp(...);
}
else {
/* Close stdout, and duplicate the output side of the pipe over stdout */
dup2(1, pdesc[1]);
execlp(...);
}
...
I've shown you how to do it on a two processes case, but you can get the general idea and adapt it to other situations.
Hope that helped!