I need to send 3 char buffers to child process and I want to treat them as 3 separate chunks of data. I thought of using read() and write() system calls but after reading man I can't see a way to separate the data - if I understand it correctly, if I write 3 buffers one by one in parent process, then one call of read() will read all the data. Of course I could put some separators like '\0' in input buffers and separate the data in child, but I'm looking for some more elegant way to do this. So, is there some kind of system call that enables to pass data sequentially?
One possibility is to use what stdio.h already gives you: fdopen() the respective ends of the pipes and use fgets()/fputs() with the FILE pointers. This assumes your data doesn't contain newlines.
Some alternatives could be to use fixed sizes with read()/write() or to use some other delimiter and parse the received data with strtok(). You could also send the size first so the child knows how many bytes to read in the next read() call. There are really lots of options.
If you have it, you can use O_DIRECT to get a "packet-oriented" pipe, but there are limitations of course.
In general for a text-based streamed protocol having separators is cleaner in my opinion.
You have two choices
Put delimiters in the data (as you mentioned in the question).
Provide feedback from the child. In other words, after writing a chunk of data to the pipe, the parent waits for a response from the child, e.g. on a second pipe, or using a semaphore.
You could precede each chunk of data with a header, including a length field if chunks can be variable length. The reader can read the header and then the chunk contents.
Related
I'm writing a library that should execute a program in a child process, capture the output, and make the output available in a line by line (string vector) way. There is one vector for STDOUT, one for STDERR, and one for "STDCOMBINED", i.e. all output in the order it was printed by the program. The child process is connected via two pipes to a parent process. One pipe for STDOUT and one for STDERR. In the parent process I read from the read-ends of the pipes, in the child process I dup2()'ed STDOUT/STDERR to the write ends of the pipes.
My problem:
I'd like to capture STDOUT, STDERR, and "STDCOMBINED" (=both in the order they appeared). But the order in the combined vector is different to the original order.
My approach:
I iterate until both pipes show EOF and the child process exited. At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR. This works so far. But when I capture out the lines as they come in the parent process, the order of STDOUT and STDERR is not the same as if I execute the program in a shell and look at the output.
Why is this so and how can I fix this? Is this possible at all? I know in the child process I could redirect STDOUT and STDERR both to a single pipe but I need STDOUT and STDERR separately, and "STDCOMBINED".
PS: I'm familiar with libc/unix system calls, like dup2(), pipe(), etc. Therefore I didn't post code. My question is about the general approach and not a coding problem in a specific language. I'm doing it in Rust against the raw libc bindings.
PPS: I made a simple test program, that has a mixup of 5 stdout and 5 stderr messages. That's enough to reproduce the problem.
At each iteration I read exactly one line (or EOF) from STDOUT and exactly one line (or EOF) from STDERR.
This is the problem. This will only capture the correct order if that was exactly the order of output in the child process.
You need to capture the asynchronous nature of the beast: make your pipe endpoints nonblocking, select* on the pipes, and read whatever data is present, as soon as select returns. Then you'll capture the correct order of the output. Of course now you can't be reading "exactly one line": you'll have to read whatever data is available and no more, so that you won't block, and maintain a per-pipe buffer where you append new data, extract any lines that are present, shove the unprocessed output to the beginning, and repeat. You could also use a circular buffer to save a little bit of memcpy-ing, but that's probably not very important.
Since you're doing this in Rust, I presume there's already a good asynchronous reaction pattern that you could leverage (I'm spoiled with go, I guess, and project the hopes on the unsuspecting).
*Always prefer platform-specific higher-performance primitives like epoll on Linux, /dev/poll on Solaris, pollset &c. on AIX
Another possibility is to launch the target process with LD_PRELOAD, with a dedicated library that it takes over glibc's POSIX write, detects writes to the pipes, and encapsulates such writes (and only those) in a packet by prepending it with a header that has an (atomically updated) process-wide incrementing counter stored in it, as well as the size of the write. Such headers can be easily decoded on the other end of the pipe to reorder the writes with a higher chance of success.
I think it's not possible to strictly do what you want to do.
If you think about how it's done when running a command in an interactive shell, what happens is that both stdout and stderr point to the same file descriptor (the TTY), so the total ordering is correct by means of synchronization against the same file.
To illustrate, imagine what happens if the child process has 2 completely independent threads, one only writing to stderr, and to other only writing to stdout. The total ordering would depend on however the scheduler decided to schedule these threads, and if you wanted to capture that, you'd need to synchronize those threads against something.
And of course, something can write thousands of lines to stdout before writing anything to stderr.
There are 2 ways to relax your requirements into something workable:
Have the user pass a flag waiving separate stdout and stderr streams in favor of a correct stdcombined, and then redirect both to a single file descriptor. You might need to change the buffering settings (like stdbuf does) before you execute the process.
Assume that stdout and stderr are "reasonably interleaved", an assumption pointed out by #Nate Eldredge, in which case you can use #Unslander Monica's answer.
Let's say that there is an existing program that listens on stdin for it's inputs. I want to create a pthread within the same program that is now the one to listen to stdin, and depending on what comes through, let it go through to the original program.
For this, I would create a pipe(), and configure the pthread to write to the input file descriptor, and the original program to listen to the output descriptor. Is this a correct way to have this done? I understand piping between processes, but is it possible to pipe like this within a single process?
Sure, you can use pipe(), but the data has to pass through the kernel even though both the end points are within the same process.
If you have source code for this (which I assume you have) and you don't mind making non-trivial changes, and performance is a priority for you, I would suggest using shared memory to send the data to the original program. It will be much faster than using pipe()
I have a code that creates two pipes for writing and reading data of 2 (to 4) child processes that call another program. The code of this program is simply two printf, one printing - and another printing Done both to the stdout which is connected to the reading pipe of the parent process.
In the parent process I read the data using
read(pipes[i][1][0], buffer, sizeof(buffer)-1);
The problem is that if I set the size of buffer to be 4 (for example) the read() call reads -Do which is not what I want, because I will call read() again after.
If the size is 2 everything works fine because I know the size of what I'm going to read, but in the rest of the code I don't have that information.
I tried fflush(stdout) after each printf() on the child process but it doesn't work. I think this is really easy to solve but I cannot figure it out, is there a way to read the prints made by the child process one by one?
A sane way might be to use newline '\n' characters as separators.
Setting the buffer size to your exact message size is a brittle hack, in that it will break as soon as you add a new message with a different length.
In general anything you expect to send over a stream-oriented connection (pipes, streams or TCP sockets) needs either a message header with length, or a delimiter, to be reasonably easy to parse.
If you desperately want to treat each write as a discrete message, you could alternatively use a datagram socket, which actually behaves like this. You'd be looking for an AF_UNIX/SOCK_DGRAM socketpair in that case.
You can set the size of the buffer to be 5 and when doing the first read() to read "-". Read only 1 character by doing following:
read(pipes[i][1][0], buffer, 2);
And after that when you do the second read. Read 4 characters by doing:
read(pipes[i][1][0], buffer, 4);
The size of the buffer is set to 5 to contain the \0 character in the end, if you want to do string operations to check what you read.
I think maybe this is an obvious question, but I just want to be sure by asking you guys.
I'm working with parent-child process communication using the pipe system call to create a unnamed pipe.
My child process needs to gather some information and send it to its parent. My questions are:
Can I only send and receive strings using the write and read functions, right? I have to forget about sending structures.
If the answer to my previous question is "yes", the right way to transfer all the information to the parent process is to call the functions write and read several times?
Thank you very much!
You can write() and read() structs just fine; use a pointer to the struct as the buf parameter. It's when you want to do this between processes not running on the same machine that you run into problems and need to do marshaling/unmarshaling to portable representations to insure the values are understood the same way everywhere. This includes recognizing the start and end of data "packets", since a pipe doesn't really have the concept of packets: if all you're doing is writing a series of identical structs, then you can just write() them and the reader can rely on read() returning 0 to indicate the end of the series; but if you need to send other information as well then you'll need a framing protocol to say "what follows is such-and-such struct", "what follows is a string", etc.
I've a new file, opened as read/write then 1 thread will receive from network and append binary data to that file, the other thread will read from the same file to process the binary data, but the read() always return 0, so I can't read the data, but if I using cat in command line to append data, then the program can read the data and process. I don't know why it can't notice the new data coming from network. I'm using open(), read(), and write() in this program.
Use a pipe instead of an HDD-file. Depending on your system (which you didnt tell us) there are only minor modifications to your code (which you didnt give us) to do that.
file operations are buffered. try flushing the stream?
Assuming that your read() and write() functions are the POSIX one, they share the file position, even if they are used in different threads. So your read after write was trying to read after the position at which write had written. Don't use file IO to communicate between threads. In most contexts, I'd not even use pipe or sockets for that (one context I'd use them is when the reading thread is using poll/select with other file descriptors) but simple shared memory and mutex.