Linux operating system - input-output piping from parent, child, parent processes - c

I have some C code called a c-shell that does the following. The parent c-shell reads in a Linux command line, and forks a child process to perform the command. The child does not exec the command until it receives a signal from the parent that it is ready for it to execute. It can handle input files for giving arguments to commands or it can just read them from the command line. It can handle sending output to output files rather than just printing the executed command output to stdout. The way that it sends the output to the output file is by the child redirecting it's stdout to a pipe, and the parent reads from this pipe once it receives the sig-child signal that the child process finished running. It can handle multiple commands (where you put a semi-colon between commands). It can handle piping output from the first command to a second command in the command line. However - and this is my question - it cannot handle a command where you pipe the output of one command to be the input of the second command, and then send the output of the second command to the output file. I'm baffled, given all the above cases work perfectly. I can redirect output from an executed child process to the parent when it finishes so it can complete it. I can redirect the output of the first command running to be the input to the second command running. But I cannot do this if I try to send the output to the second command to an output file. If this question does not make sense, I will post more specifics.
For example: if I enter into my c-shell the following command line: ls -l | grep lsOut (meaning, I do a detailed directory listing, and within that directory listing output, there are some files that contain the characters, "lsOut" (output files from the ls command), and the grep command should filter out all other files in the directory listing that do not contain those characters. That works just fine when it prints to stdout. When I do a command such as: ps > psOut, the output of the ps command writes to the psOut file with no problem. However, if I do the command: ls -l | grep lsOut > lsOutFile, what happens is baffling. It prints the first command, ls -l, to stdout and although I see in print statements that the second command, grep lsOut is being run, and should be receiving the output from ls -l as input to grep lsOut, it appears not to have any affect. The only output is the entire ls -l directory with no grep filtering, and although it says it writes it to the output file, it does not get there. If you want me to post a link to code, I can do that. Thank you very much! I spent hours trying to debug this problem.

The way that it sends the output to the output file is by the child
redirecting it's stdout to a pipe, and the parent reads from this pipe
once it receives the sig-child signal that the child process finished
running.
Hold it right here. As the saying goes: Do not pass "Go". Do not collect $200.
This part is already not quite right. If the child process starts spewing sufficient amount of output, you'll end up with both a hung parent and a hung child process, here.
Pipe buffers are not unlimited in size. Pipe buffers have a fixed, upper, maximum internal size. My recollection is that the default pipe buffer size is 8,192 bytes. It might actually be something else, but the actual size doesn't matter. Whatever the pipe buffer size is, once the buffer fills up, the process that's writing to the pipe buffer is put to sleep, until the reading process starts emptying the pipe by reading from it. As long as the reader and the writer processes work independently, one's reading, one's writing, everything runs smoothly. If the writer is writing faster than the reader is reading, once the number of unread characters reaches the pipe's maximum size, the kernel quietly puts the writer process to sleep, inside write(), until the reader catches up.
If your parent process waits for the child process to exit before it starts reading from the stdout pipe, and the child process writes more than 8,192 (or whatever the actual size is) bytes, the child process will be paused inside it's write() call, until the pipe's read from. And since the parent process isn't going to read from the pipe until the child process terminates, both processes will wait for each other, forever.
So, we already know that your application isn't handling this situation correctly. Although you've described a slightly different problem with your application, given that the application is not handling inter-process pipe semantics correctly, it's fairly likely that your actual problem, if not this, is closely related.
You must completely re-engineer how your applications implements inter-process piping, correctly.

Related

C redirect terminal descriptor

It's possible to redirect everything that is written in the terminal to a process?
For example, after I started the process, if I write "command" in the terminal, this should be redirected to a pipe from my process or something like this.
Yes, it should be practical to redirect all terminal output from your program (and all of its child processes) after your program has started. Unix programs usually write to the terminal by writing to standard output (stdout). Standard output is always on the file descriptor number 1 (the C constant is STDOUT_FILENO), for all processes. You can use the dup2() system call to replace any file descriptor number with another file descriptor.
So you can e.g. create a pipe using int fds[2]; pipe(fds);. Then fds[1] will be a file descriptor number that you can use to write to the pipe. If you do dup2(fds[1], STDOUT_FILENO); then standard output will also write to the pipe. (You can close(fds[1]); afterwards since you probably don't need it, now that you can use stdout instead.)
You can also open a file for writing with fd = open("filename", O_WRONLY); and then dup2(fd, STDOUT_FILENO); so everything written to stdout goes into your file.
Note that you need to redirect stdout at the very beginning of your program before doing anything that might write to stdout.
The above trick will make standard output go to your pipe instead of the terminal. If you want the output to go to the terminal, and also get a copy of the output in a pipe of file, that's more difficult but can also be done. You need to create an internal pipe, then dup2(that_pipe, STDOUT_FILENO); so stdout writes to that pipe. Then you need to read from that pipe (probably using poll() then read()) and write everything you got to both 1) the terminal and 2) to another pipe or file that is going outside your program. So you need two pipes if you want to copy output.
The tee command does this (copy stdout to files) from the shell.
This dup2() approach is not bulletproof because a Unix terminal (even when using a GUI terminal emulator instead of a hardware console) is a device in /dev. You can type tty in a shell or use ttyname(STDOUT_FILENO) in C to see which file in /dev corresponds to the terminal that stdout is writing to. In principle, any program (under the same user account) could open the terminal device using that filename and write to it without asking for permission from any other program. You can easily try this from the shell using the write program:
echo hello world | write $(whoami) /dev/ttys123
where /dev/ttys123 is whatever you got by typing tty in some other terminal window (the name looks a bit different on different operating systems, e.g. Linux and MacOS). You should see hello world appear in that other window.
From a child process, no. You must set this up in the parent preocess, and have it propagate downwards to children (barring some kind of crazy hack).
From the shell, you can redirect.
exec >file
This will redirect standard output to file, and it will apply to all future commands run in the shell. You can make this into a function, if you like.

Fork and dup2 - Child process is not terminating - Issues with file descriptors?

I am writing my own shell for a homework assignment, and am running into issues.
My shell program gets an input cat scores | grep 100 from the console and prints the output as expected but the grep command doesn't terminate and I can see it running infinitely from ps command.
EDIT - There was an error while closing fds. Now grep command is not executing and console output is -
grep: (standard input): Bad file descriptor
I am reading the number of commands from the console and creating necessary pipes and storing them in a two dimensional int array fd[][] before forking the first process.
fd[0][0] will contain read end of 1st pipe and fd[0][1] will contain write end of 1st pipe. fd[1][0] will contain read end of 2nd pipe and fd[1][1] will contain write end of 2nd pipe and so on.
Each new process duplicates its stdin with the read end of its pipe with the previous process and duplicates its stdout with the write end of its pipe with the next process.
Below is my function:
void run_cmds(char **args, int count,int pos)
{
int pid,status;
pid = fork();
if ( pid == 0 )
{
if(pos != 0) dup2(fd[pos-1][0],0); // not changing stdin for 1st process
if(pos != count) dup2(fd[pos][1],1); //not changing stdout for last process
close_fds(pos);
execvp(*args,args);
}
else
{
waitpid(pid,&status,0);
count--;
pos++;
//getting next command and storing it in args
if(count > 0)
run_cmds(args,count,pos);
}
}
}
args will contain the arguments for the command.
count is the number of commands I need to create.
pos is the position of the command in the input
I am not able to figure out the problem. I used this same approach for hard coded values before this and it was working.
What am I missing with my understanding/implementation of dup2/fork and why is the command waiting infinitely?
Any inputs would be greatly helpful. Struck with this for the past couple of days!
EDIT : close_fds() function is as below -
For any process , I am closing both the pipes linking the process.
void close_fds(int pos)
{
if ( pos != 0 )
{
close(fd[pos-1][0]);
close(fd[pos-1][1]);
}
if ( pos != count)
{
close(fd[pos][0]);
close(fd[pos][1]);
}
}
First diagnosis
You say:
Each new process duplicates its stdin with the read end of its pipe with the previous process and duplicates its stdout with the write end of its pipe with the next process.
You don't mention the magic word close().
You need to ensure that you close both the read and the write end of each pipe when you use dup() or dup2() to connect it to standard input. That means with 2 pipes you have 4 calls to close().
If you don't close the pipes correctly, the process that is reading won't get EOF (because there's a process, possibly itself, that could write to the pipe). It is crucial to have enough (not too few, not too many) calls to close().
I am calling close_fds() after dup2 calls. The function will go through the fd[][2] array and do a close() call for each fd in the array.
OK. That is important. It means my primary diagnosis probably wasn't spot on.
Second diagnoses
Several other items:
You should have code after the execvp() that reports an error and exits if the execvp() returns (which means it fails).
You should not immediately call waitpid(). All the processes in a pipeline should be allowed to run concurrently. You need to launch all the processes, then wait for the last one to exit, cleaning up any others as they die (but not necessarily worrying about everything in the pipeline exiting before continuing).
If you do force the first command to execute in its entirety before launching the second, and if the first command generates more output than will fit into the pipe, you will have a deadlock — the first process can't exit because it is blocked writing, and the second process can't be started because the first hasn't exited. Interrupts and reboots and the end of the universe will all solve the problem somewhat crudely.
You decrement count as well as incrementing pos before you recurse. That might be bad. I think you should just increment pos.
Third diagnosis
After update showing close_fds() function.
I'm back to "there are problems with closing pipes" (though the waiting and error reporting problems are still problems). If you have 6 processes in a pipeline and all 5 connecting pipes are created before any processes are run, each process has to close all 10 pipe file descriptors.
Also, don't forget that if the pipes are created in the parent shell, rather than in a subshell that executes one of the commands in the pipeline, then the parent must close all the pipe descriptors before it waits for the commands to complete.
Please manufacture an MCVE (How to create a Minimal, Complete, and Verifiable Example?) or
SSCCE (Short, Self-Contained, Correct Example) — two names and links for the same basic idea.
You should create a program that manufactures the data structures that you're passing to the code that invokes run_cmds(). That is, you should create whatever data structures your parsing code creates, and show the code that creates the pipe or pipes for the 'cat score | grep 100' command.
I am no longer clear how the recursion works — or whether it is invoked in your example. I think it is unused, in fact in your example, which is probably as well since you would end up with the same command being executed multiple times, AFAICS.
Most probable reasons why grep doesn't terminate:
You don't call waitpid with the proper PID (even though there is such a call in your code, it may not get executed for some reason), so grep becomes a zombie process. Maybe your parent shell process is waiting for another process first (infinitely, because the other one never terminates), and it doesn't call waitpid with the PID of grep. You can find Z in the output of ps if grep is a zombie.
grep doesn't receive an EOF on its stdin (fd 0), some process is keeping the write end of its pipe open. Have you closed all file descriptors in the fd array in the parent shell process? If not closed everywhere, grep will never receive an EOF, and it will never terminate, because it will be blocked (forever) waiting for more data on its stdin.

Pipes and Forks, does not get displayed on stdout

I'm working on my homework which is to replicate the unix command shell in C.
I've implemented till single command execution with background running (&).
Now I'm at the stage of implementing pipes and I face this issue, For pipes greater than 1, the children commands with pipe are completed, but the final output doesn't get displayed on stdout (the last command's stdin is replaced with read of last pipe)
dup2(pipes[lst_cmd], 0);
I tried fflush(STDIN_FILENO) at the parent too.
The exit of my program is CONTROL-D, and when i press that, the output gets displayed (also exits since my operation on CONTROL-D is to exit(0)).
I think the output of pipe is in the stdout buffer but doesn't get displayed. Is there anyother means than fflush to get the stuff in the buffer to stdout?
Having seen the code (unfair advantage), the primary problem was the process structure combined with not closing pipes thoroughly.
The process structure for a pipeline ps | sort was:
main shell
- coordinator sub-shell
- ps
- sort
The main shell was creating N pipes (N = 1 for ps | sort). The coordinator shell was then created; it would start the N+1 children. It did not, however, wait for them to terminate, nor did it close its copy of the pipes. Nor did the main shell close its copy of the pipes.
The more normal process structure would probably do without the coordinator sub-shell. There are two mechanisms for generating the children. Classically, the main shell would fork one sub-process; it would do the coordination for the first N processes in the pipeline (including creating the pipes in the first place), and then would exec the last process in the pipeline. The main shell waits for the one child to finish, and the exit status of the pipeline is the exit status of the child (aka the last process in the pipeline).
More recently, bash provides a mechanism whereby the main shell gets the status of each child in the pipeline; it does the coordination.
The primary fixes (apart from some mostly minor compilation warnings) were:
main shell closes all pipes after forking coordinator.
main shell waits for coordinator to complete.
coordinator closes all pipes after forking pipeline.
coordinator waits for all processes in pipeline to complete.
coordinator exits (instead of returning to provide duelling dual prompts).
A better fix would eliminate the coordinator sub-shell (it would behave like the classical system described).

Fork and wait - how to wait for all grandchildren to finish

I am working on an assignment to build a simple shell, and I'm trying to add a few features that aren't required yet, but I'm running into an issue with pipes.
Once my command is parsed, I fork a process to execute them. This process is a subroutine that will execute the command, if there is only one left, otherwise it will fork. The parent will execute the first command, the child will process the rest. Pipes are set up and work correctly.
My main process then calls wait(), and then outputs the prompt. When I execute a command like ls -la | cat, the prompt is printed before the output from cat.
I tried calling wait() once for each command that should be executed, but the first call works and all successive calls return ECHILD.
How can I force my main thread to wait until all children, including children of children, exit?
You can't. Either make your child process wait for its children and don't exit until they've all been waited for or fork all the children from the same process.
See this answer how to wait() for child processes: How to wait until all child processes called by fork() complete?
There is no way to wait for a grandchild; you need to implement the wait logic in each process. That way, each child will only exit after all it's children have exited (and that will then include all grandchildren recusively).
Since you are talking about grandchilds, you are obviously spawning the childs in a cascading manner. Thats a possible way to implement a pipe.
But keep in mind that the returned value from your pipe (the one you get when doing echo $? in your terminal) is the one returned from the right-most command.
This means that you need to spawn childs from right to left in this cascading implementation. You dont want to lose that returned value.
Now assuming we are only talking about builtin commands for the sake of simplicity (no extra calls to fork() and execve() are made), an intersting fact is that in some shells like "zsh", the right-most command is not even forked. We can see that with a simple piped command like:
export stack=OVERFLOW | export overflow=STACK
Using then the command env, we can appreciate the persistance of the overflow=STACK in the environment variables. It shows that the right-most command was not executed in a subshell, whereas export stack=OVERFLOW was.
Note: This is not the case in a shell like "sh".
Now lets use a basic piped command to give a possible logic for this cascading implementation.
cat /dev/random | head
Note: Even though cat /dev/random is supposedly a never ending command, it will stop as soon as the command head is done reading the first line outputed by cat /dev/random. This is because stdin is closed when head is done, and the command cat /dev/random aborts because its writing in a broken pipe.
LOGIC:
The parent process (your shell) sees that there is a pipe to execute. It will then fork two processes. The parent stays your shell, it will wait for the child to return, and store the returned value.
In the context of the first generation child: (trying to execute the right-most command of the pipe)
It sees that the command is not the last command, it will fork() again (What i call "cascading implementation").
Now that the fork is done, the parent process is going to execute first of all its task (head -1), it will then close its stdin and stdout, then wait() for its child. This is really important to close firstly stdin and stdout, then call wait(). Closing stdout sends EOF to the parent, if reading on stdin. Closing stdin make sure the grand-children trying to write in the pipe aborts, with a "broken pipe" error.
In the context of the grand-children:
It sees that it is the last command of a pipe, it will just execute the command and return its value (it closes stdin and stdout).

Using streams to pipe input/output between *nix processes

I'm working on a fairly simple application in C. The end goal is to pipe the output from one process to in input of another in a *nix environment (yes, I am aware of the pipe() command and dup/dup2 but I'm trying to find away around using those commands). I was wondering if there is any way to connect the streams rather than using file descriptors (The systems aren't guaranteed to be POSIX compliant).
So basically I want to do something like this (pseudo-code)
pid = fork()
if pid == 0
// assign this process's stdin to the parents stdout.
stdin = parent.stdout;
exec() // launch new process that receives the parents stdout as stdin
// child stuff....
else
// parent stuff....
I know that it probably won't be as simple as just doing an assignment as above, but is there any way to do this using only streams? I tried looking around, but couldn't find anything..
Thanks!
sorry if I'm missing the point here but the whole philosophy of *nix is one program, one job. If you need a program to dump the contents of a program to the screen then you have the cat command. If the files too big and you need page breaks you pipe the output of cat to the more command:
cat myfile.txt | more
If you need to pipe between two terminal applications then you're meant to use the command line to do so:
myprog1 | myprog2
Obviously that's the philosophical approach, so if that doesn't help then can you clarify what you're trying to pipe and why you're trying to do it in process ?

Resources