pipe causing processes to freeze

pipe causing processes to freeze - c

The program report launches two accessed processes. Report basically feeds accessed a list of filenames and accessed prints if they have been accessed in x days.
However, my implementation is causing accessed to freeze somehow. After running reports, nothing gets printed. When I run ps, I can see two accessed programs hanging around, not dying.
At first, I thought the method of reading from stdin was wrong in accessed, but I manually piped some filenames to it cat filenames.txt | ./accessed, and it works. So report program must be wrong.
I attached gdb to the frozen accessed processes and it seems that it is frozen at the while loop getline. So I changed the while loop to a single getline statement and it suddenly works. However, I need to read stdin until EOF. Any help on the possible sources of errors is very much appreciated. This is causing me much headache.
Schematics:
--------
| |--------> Access1 ---> print stuff out
|report|
|______|--------> Access2 ----> print stuff out

Each process closes its own input pipe (after dup2'ing the read end).
However, they leave the other process's input pipe open. Each one will never see an EOF, even after the parent process closes it, since there's the possibility that the other child process might decide to send data to it.
The children should close each others' pipes (or be started without inheriting the other process's pipes).

Related

Capturing stdout/stderr separately and simultaneously from child process results in wrong total order (libc/unix)

I'm writing a library that should execute a program in a child process, capture the output, and make the output available in a line by line (string vector) way. There is one vector for STDOUT, one for STDERR, and one for "STDCOMBINED", i.e. all output in the order it was printed by the program. The child process is connected via two pipes to a parent process. One pipe for STDOUT and one for STDERR. In the parent process I read from the read-ends of the pipes, in the child process I dup2()'ed STDOUT/STDERR to the write ends of the pipes.
My problem:
I'd like to capture STDOUT, STDERR, and "STDCOMBINED" (=both in the order they appeared). But the order in the combined vector is different to the original order.
My approach:
I iterate until both pipes show EOF and the child process exited. At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR. This works so far. But when I capture out the lines as they come in the parent process, the order of STDOUT and STDERR is not the same as if I execute the program in a shell and look at the output.
Why is this so and how can I fix this? Is this possible at all? I know in the child process I could redirect STDOUT and STDERR both to a single pipe but I need STDOUT and STDERR separately, and "STDCOMBINED".
PS: I'm familiar with libc/unix system calls, like dup2(), pipe(), etc. Therefore I didn't post code. My question is about the general approach and not a coding problem in a specific language. I'm doing it in Rust against the raw libc bindings.
PPS: I made a simple test program, that has a mixup of 5 stdout and 5 stderr messages. That's enough to reproduce the problem.

At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR.
This is the problem. This will only capture the correct order if that was exactly the order of output in the child process.
You need to capture the asynchronous nature of the beast: make your pipe endpoints nonblocking, select* on the pipes, and read whatever data is present, as soon as select returns. Then you'll capture the correct order of the output. Of course now you can't be reading "exactly one line": you'll have to read whatever data is available and no more, so that you won't block, and maintain a per-pipe buffer where you append new data, extract any lines that are present, shove the unprocessed output to the beginning, and repeat. You could also use a circular buffer to save a little bit of memcpy-ing, but that's probably not very important.
Since you're doing this in Rust, I presume there's already a good asynchronous reaction pattern that you could leverage (I'm spoiled with go, I guess, and project the hopes on the unsuspecting).
*Always prefer platform-specific higher-performance primitives like epoll on Linux, /dev/poll on Solaris, pollset &c. on AIX
Another possibility is to launch the target process with LD_PRELOAD, with a dedicated library that it takes over glibc's POSIX write, detects writes to the pipes, and encapsulates such writes (and only those) in a packet by prepending it with a header that has an (atomically updated) process-wide incrementing counter stored in it, as well as the size of the write. Such headers can be easily decoded on the other end of the pipe to reorder the writes with a higher chance of success.

I think it's not possible to strictly do what you want to do.
If you think about how it's done when running a command in an interactive shell, what happens is that both stdout and stderr point to the same file descriptor (the TTY), so the total ordering is correct by means of synchronization against the same file.
To illustrate, imagine what happens if the child process has 2 completely independent threads, one only writing to stderr, and to other only writing to stdout. The total ordering would depend on however the scheduler decided to schedule these threads, and if you wanted to capture that, you'd need to synchronize those threads against something.
And of course, something can write thousands of lines to stdout before writing anything to stderr.
There are 2 ways to relax your requirements into something workable:
Have the user pass a flag waiving separate stdout and stderr streams in favor of a correct stdcombined, and then redirect both to a single file descriptor. You might need to change the buffering settings (like stdbuf does) before you execute the process.
Assume that stdout and stderr are "reasonably interleaved", an assumption pointed out by #Nate Eldredge, in which case you can use #Unslander Monica's answer.

How fork() and scanf() work together?

I tried to see what happens if I read something from keyboard while I have multiple processes with fork() (in my case there are two children and a parent) and I discovered the following problem: I need to tell the parent to wait for children's processes, otherwise the program behaves strangely.
I did a research and I found that the problem is with the parent, he needs to wait for the child's process to end because if the parent's process ends first somehow he closes the STDIN, am I right? But also I found that every process has a copy of STDIN so my question is:
Why it works this way and why only the parent has the problem with STDIN and the children not, I mean why if the child's process ends first doesn't affect STDIN but if the parent's process ends first it does affect STDIN?
Here are my tests:
I ran the program without wait() and after I typed a number the program stopped, but then I pressed enter two more times and the other two messages from printf() appeared.
When I ran the program with wait() everything worked fine, every process called scanf() separately and read a different number.

Well, a lot of stuff is going on here. I will try to explain it step by step.
When you start your terminal, the terminal creates a special file having path /dev/pts/<some number>. Then it starts your shell (which is bash in this case) and links the STDIN, STDOUT and STDERR of the bash process to this special file. This file is called a special file because it doesn't actually exist on your hard disk. Instead, whatever you write to this file, it goes directly to the terminal and the terminal renders it on the screen. (Similarly, whenever you try to read from this file, the read blocks until someone types something at the terminal).
Now when you launch your program by typing ./main, bash calls the fork function in order create a new process. The child process execs your executable file, while the parent process waits for the child to terminate. Your program then calls fork twice and we have three processes trying to read their STDINs, ie the same file /dev/pts/something. (Remember that calling fork and exec duplicates and preserves the file descriptors respectively).
The three processes are in race condition. When you enter something at the terminal, one of the three processes will receive it (99 out of 100 times it would be the parent process since the children have to do more work before reaching scanf statement).
So, parent process prints the number and exits first. The bash process that was waiting for the parent to finish, resumes and puts the STDIN into a so called "non-canonical" mode, and calls read in order to read the next command. Now again, three processes (Child1, Child2 and bash) are trying to read STDIN.
Since the children are trying to read STDIN for a longer time, the next time you enter something it will be received by one of the children, rather than bash. So you think of typing, say, 23. But oops! Just after you press the 2 key, you get Your number is: 2. You didn't even press the Enter key! That happened because of this so called "non-canonical" mode. I won't be going into what and why is that. But for now, to make things easier, use can run your program on sh instead of bash, since sh doesn't put STDIN into non-canonical mode. That will make the picture clear.
TL;DR
No, parent process closing its STDIN doesn't mean that its children or other process won't be able to use it.
The strange behavior you are seeing is because when the parent exits, bash puts the pty (pseudo terminal) into non-canonical mode. If you use sh instead, you won't see that behavior. Read up on pseudo terminals, and line discipline if you want to have a clear understading.
The shell process will resume as soon as the parent exits.
If you use wait to ensure that parents exits last, you won't have any problem, since the shell won't be able to run along with your program.
Normally, bash makes sure that no two foreground processes read from STDIN simultaneously, so you don't see this strange behavior. It does this by either piping STDOUT of one program to another, or by making one process a background process.
Trivia: When a background process tries to read from its STDIN, it is sent a signal SIGTTIN, which stops the process. Though, that's not really relevant to this scenario.

There are several issues that can happen when multiple processes try to do I/O to the same TTY. Without code, we can't tell which may be happening.
Trying to do I/O from a background process group may deliver a signal: SIGTTIN for input (usually enabled), or SIGTTOU for output (usually disabled)
Buffering: if you do any I/O before the fork, any data that has been buffered will be there for both processes. Under some conditions, using fflush may help, but it's better to avoid buffering entirely. Remember that, unlike output buffering, it is impossible to buffer input on a line-by-line basis (although you can only buffer what is available, so it may appear to be line-buffered at first).
Race conditions: if more than one process is trying to read the same pipe-like file, it is undefined which one will "win" and actually get the input each time it is available.

I want to make a REPL-style program with pipes and signals, but my program locks

The program should fork, then the parent should read user input, send it to the child; the child should deal with it, then send a result to the parent, who prints it (it is required to work this way).
I've done a part of it, but the program locks after reading from the fifo the first time.
I suspect the problem is somewhere between lines 122–199. Making the pipe nonblocking makes the program jump through the scanf at 185 and loop indefinitely. Closing and reopening the pipe before writing and after reading leads to the same effect.
Here is the source: link.
Later edit (clarification):
The parent blocks before the printf at 184, when it reads the second command (the first time it seems to work just fine).
I haven't implemented the "child sends stuff back to the parent" part. At the moment I just want to make the child output the data it recieves through the pipe from the parent and then give control back to the parent to read another command.
The child lives in a paused state (pause()) while the parent reads input and sends it through the pipe, then it wakes up the child and goes in a paused state itself. The child reads data from the pipe and outputs it, then wakes up the parent and goes to sleep.

Did you use some multiplexing system call like select or poll which are able to test of a set of file descriptors has some of them ready (for input, or for output)?
Learn more about poll or about select and friends.

You should clarify your question which with part (parent or child) it is that locks.
Make sure all output is line feed terminated. It seems (from a quick reading) that you use puts(), which should take care of that.
Try calling fflush() on the output after the client is done, to make sure the output gets written to the pipe, if the child lives on while the parent reads its output. I didn't read your code closely enough to track down the lifetime handling.

Is it possible to know how many bytes have been printed to a file stream such as standard output?

Is it possible for a caller program in C to know how many bytes it has printed to a file stream such as stdout without actually counting and adding up the return values of printf?
I am trying to implement control of the quantity of output of a C program which uses libraries to print, but the libraries don't report the amount of data they have printed out.
I am interested in either a general solution or a Unix-specific one.

POSIX-specific: redirect stdout to a file, flush after all writing is done, then stat the file and look at st_size (or use the ls command).
Update: You say you're trying to control the quantity of output of a program. The POSIX head command will do that. If that's not satisfactory, then state your requirements clearly.

It's rather a heavyweight solution, but the following will work:
Create a pipe by calling pipe()
Spawn a child process
In the parent: redirect stdout to the write-side of the pipe, and close the read side (and the old stdout)
In the child: keep reading from the read side of the pipe, and copying the data to the inherited stdout (which is the original stdout) - counting it as it goes past
In the parent, keep writing to stdout (which is now the pipe) as usual
Use some form of IPC to communicate the result of the count from the child to the parent.
Basically the idea is to spawn a child process, and pipe all output through it, and have the child process count all the data as it goes through.
The precise form of IPC to use may vary - for example, shared memory (with atomic reads/writes on each side) would work well for fast transfer of data, but other methods (such as sockets, more pipes etc) are possible, and offer better scope for synchronisation.
The trickiest part is the synchronisation, i.e. ensuring that, at the time the child tells the parent how much data has been written, it has already processed all the data that the parent said (and there is none left in the pipe, for example). How important this is will depend on exactly what your aim is - if an approximate indication is all that's required, then you may be able to get away with using shared memory for IPC and not performing any explicit synchronisation; if the total is only required at the end then you can close stdout from the parent, and have the child indicate in the shared memory when it has received the eof notification.
If you require more frequent readouts, which must be exact, then something more copmlex will be required, but this can be achieved by designing some sort of protocol using sockets, pipes, or even condvars/semaphores/etc in the shared memory.

printf returns the number of bytes written.
Add them up.

No idea how reliable this is, but you can use ftell on stdout:
long int start = ftell(stdout);
printf("abcdef\n");
printf("%ld\n", ftell(stdout) - start); // >> 7
EDIT
Checked this on Ubuntu Precise: it does not work if the output goes to the console, but does work if it is redirected to a file.
$ ./a.out
abcdef
0
$ ./a.out >tt
$ cat tt
abcdef
7
$ echo `./a.out`
abcdef 0
$ echo `cat tt`
abcdef 7

Find out if pipe's read end is currently blocking

I'm trying to find out if a child process is waiting for user input (without parsing its output). Is it possible, in C on Unix, to determine if a pipe's read end currently has a read() call blocking?
The thing is, I have no control over the programs exec'd in the child processes. They print all kinds of verbose garbage which I would usually want to redirect to /dev/null. Occasionally though one will prompt the user for something. (With the prompt having no reliable format.) So my idea was:
In a loop:
Drain child's stdout, append it to a temporary buffer.
Check (no idea how) if the child is asking for user input, in which case the buffer is printed to stdout.
When the child exits, throw away the buffer.

The thing is, I have no control over the programs exec'd in the child processes. They print all kinds of verbose garbage which I would usually want to redirect to /dev/null. Occasionally though one will prompt the user for something. (With the prompt having no reliable format.) So my idea was:
In a loop:
Drain child's stdout, append it to a temporary buffer.
Check (no idea how) if the child is asking for user input, in which case the buffer is printed to stdout.
When the child exits, throw away the buffer.

You have these options:
if you know that the child will need certain input (such as shell that will read a command), just write to a pipe
if you assume the child won't read anything usually, but may do it sometimes, you probably need something like job control in the shell (use a terminal for communication with the child, use process groups and TIOCSPGRP ioctl on the terminal to get the child to the background; the child will get SIGTTIN when it tries to read from the terminal, and you can wait() for that). This is how bash handles things like "(sleep 10; read a;)&"
if you don't know what to write, or you have more possibilities, you will have to parse the output

That sounds as if you were trying to supervise dpkg where occasionally some post-inst script queries the admin whether it may override some config file.
Anyway, you may want to look at how strace works:
strace -f -etrace=read your.program
Of course you need to keep track of which fds are the pipes you write about, but you probably need only stdin, anyway.

I don't think that's true: For example, right before calling read() on the reader side, the pipe would have a reader that isn't actually reading.

You would typically just write to the pipe, or use select or poll. If you need a handshake mechanism you can do that out of band various ways or come up with and in-band protocol.
I don't know if there is a built-in way to know if a reader on the other end is blocking. Why do you need to know this?

If I recall correctly, you can not have a pipe with no reader which means that you have either a read(2) or a select(2) syscal pending at all time.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight