What is the difference between stream and pipe in C - c

In Inter Process Communication(IPC), to communicate with each process "PIPE" that OS provides should be needed. And to transmit data from Input unit to program or from program to Output unit "Stream" that OS provides should be needed.
Here are my questions.
Are there differences between PIPE and Stream??
If they are different, because their functions are very similar isn't it more useful using only "PIPE" or "Stream" to transmit data??

A pipe is a communication channel between two processes. It has a writing end and a reading end. When on open one of these two end, one get a (writing or reading) stream. So in a first approximation there is a stream at each end of a pipe.
So to set up an IPC, you should
create a pipe using the function pipe. This return two ints identifying the two ends of the pipes;
usually fork to get two processes;
open each end of the pipe (usually in a different process after forking) and get two corresponding streams.
See http://www.gnu.org/software/libc/manual/html_node/Creating-a-Pipe.html

Related

Can single pipe be connected and read by multiple processes

From my understanding, C pipes are like a special kind of file, where internally, the kernal keep tracks of the openings and closings from each process in a table. see the post here
So in that sense:
Is it possible for 1 single pipe to be connected by multiple processes?
If it is possible, can multiple processes read the same data?
If 2 is possible, will they be reading the same data, or does reading the data "empty" the data?
For example: process 1 writes into pipe, can process 2,3,4 read the data that process 1 wrote?
Yes, multiple processes can read from (or write to) a pipe.
But data isn't duplicated for the processes. Once data has been read from the pipe by one process, it's lost and available only to the process that actually read it.
Conversely, there's no way to distinguish data or from which process it originated if you have multiple processes writing to a single pipe.
1. Is it possible for 1 single pipe to be connected by multiple processes?
Yes.
2. If it is possible, can multiple processes read the same data?
No!
Unix fifos (pipes) can not be used in "single producer, multiple consumer" (spmc) manner; this also holds for Unix Domain Sockets (for most implementations UDS and fifos are implemented by the very same code, with just a few configuration bits differing on creation). Each byte written into a pipe / SOCK_STREAM UDS (or datagram written into a SOCK_DGRAM unix domain socket) can be read from only one single reading end.
However what's perfectly possible is having a "multiple producer, single consumer" fifo, UDS, that is the consumer having open one reading end (and also keeping open the writing end, but not using it¹), multiple producers can send data to the single consumer. For stream oriented pipes there's no strict ordering, so all the bytes sent will get mixed up. But for SOCK_DGRAM UDS socketpairs message boundaries are preserved.
¹: There's a particular pitfall, that if the creating process does not keep open its instance of the writing end, as soon as any one of the producer processes closes one of their writing end, it will tear down the connection for all other processes.

Running multiple forked processes and constantly reading their standard out, while printing to their standard in

I'm looking to run X amount of processes that I'm able to iterate through in order to run programs where there's a master and 'slaves' that take the masters orders and return a string.
I'm writing in C. I'm wondering how I'd be able to set up pipes and forking between there processes to read from standard in and out. I'm currently able to have them work one at a time until the are killed, but I would like to simply read one line then move to the next process. Any help?
Generally, the common strategy for this sort of programming is to set up an event loop.
You would set up pipes and connect them to stdin and stdout of your program.
You don't specify what language you're using.
In C, you would create two pipes, one for reading, and one for writing.
Then you would fork. After the fork, in the child, you close stdin and stdout, and you use the dup2 system call to copy one end of the pipe filedescriptors to the child.
In the parent, you connect each process to an event loop, which lets you know when one of your FDs is ready for reading or writing.
Take a look at these class notes for discussion of using pipes and dup2.
Here's an introduction to libevent, one of the common event loops for C.
For other languages you'll do something similar. For example for Python, take a look at the asyncio support for subprocesses.

Breaking down shell scripts; What happens under the hood?

So, I was given this one line script:
echo test | cat | grep test
Could you please explain to me how exactly that would work given the following system calls: pipe(), fork(), exec() and dup2()?
I am looking for an general overview here and mainly the sequence of operations.
What I know so far is that the shell will fork using fork() and the script's code will replace the shell's one by using the exec(). But what about pipe and dup2? How do they fall in place?
Thanks in advance.
First consider a simpler example, such as:
echo test | cat
What we want is to execute echo in a separate process, arranging for its standard output to be diverted into the standard input of the process executing cat. Ideally this diversion, once setup, would require no further intervention by the shell — the shell would just calmly wait for both processes to exit.
The mechanism to achieve that is called the "pipe". It is an interprocess communication device implemented in the kernel and exported to the user-space. Once created by a Unix program, a pipe has the appearance of a pair of file descriptors with the peculiar property that, if you write into one of them, you can read the same data from the other. This is not very useful within the same process, but keep in mind that file descriptors, including but not limited to pipes, are inherited across fork() and even accross exec(). This makes pipe an easy to set up and reasonably efficient IPC mechanism.
The shell creates the pipe, and now owns a set of file descriptors belonging to the pipe, one for reading and one for writing. These file descriptors are inherited by both forked subprocesses. Now only if echo were writing to the pipe's write-end descriptor instead of to its actual standard output, and if cat were reading from the pipe's read-end descriptor instead of from its standard input, everything would work. But they don't, and this is where dup2 comes into play.
dup2 duplicates a file descriptor as another file descriptor, automatically closing the new descriptor beforehand. For example, dup2(1, 15) will close file descriptor 1 (by convention used for the standard output), and reopen it as a copy of file descriptor 15 — meaning that writing to the standard output will in fact be equivalent to writing to file descriptor 15. The same applies to reading: dup2(0, 8) will make reading from file descriptor 0 (the standard input) equivalent to reading from file descriptor 8. If we proceed to close the original file descriptor, the open file (or a pipe) will have been effectively moved from the original descriptor to the new one, much like sci-fi teleports that work by first duplicating a piece of matter at a remote location and then disintegrating the original.
If you're still following the theory, the order of operations performed by the shell should now be clear:
The shell creates a pipe and then fork two processes, both of which will inherit the pipe file descriptors, r and w.
In the subprocess about to execute echo, the shell calls dup2(1, w); close(w) before exec in order to redirect the standard output to the write end of the pipe.
In the subprocess about to execute cat, the shell calls dup2(0, r); close(r) in order to redirect the standard input to the read end of the pipe.
After forking, the main shell process must itself close both ends of the pipe. One reason is to free up resources associated with the pipe once subprocesses exit. The other is to allow cat to actually terminate — a pipe's reader will receive EOF only after all copies of the write end of the pipe are closed. In steps above, we did close the child's redundant copy of the write end, the file descriptor 15, right after its duplication to 1. But the file descriptor 15 must also exist in the parent, because it was inherited under that number, and can only be closed by the parent. Failing to do that leaves cat's standard input never reporting EOF, and its cat process hanging as a consequence.
This mechanism is easily generalized it to three or more processes connected by pipes. In case of three processes, the pipes need to arrange that echo's output writes to cat's input, and cat's output writes to grep's input. This requires two calls to pipe(), three calls to fork(), four calls to dup2() and close (one for echo and grep and two for cat), three calls to exec(), and four additional calls to close() (two for each pipe).

Should data to the pipe be written at a time?

I have a scenario where two pipes are used for IPC between child and parent. The child process uses execvp to execute a remote program. The parent process takes care of writing data to the pipe. The remote programs stdin is duplicated to read end of one pipe. To the same pipe parent writes data at the write end. The remote program has a simple getchar() in one of the functions that is called twice in the remote program's main function.
The parent writes data in the following sequence.
writes data to the pipe. Closes all the required handles. (say wrote 1)
after some time writes data again to the pipe. Closes handle (say wrote 2)
The getchar in the remote program reads "1" in the proper fashion. But the problem comes while reading "2". The getchar is reading garbage values.
I have debugged using GDB and the program exits normally. No "signals" are raised while debugging.
I have used the fork(), dup2() and pipe() functions and need to stick to it.

How can a child process return two values to the parent when using pipe()?

I have my child process counting the frequency of words from a text file. I am using pipe() for IPC. How can the child process return both the word name and the word frequency to the parent process? My source code is in C and I am executing it in a UNIX environment.
Write the two values to one end of the pipe in the child, separated by some delimiter. In the parent, read from the other end of the pipe, and separate the content using the delimiter.
Writes to a pipe up to the size of PIPE_BUF are atomic (included in limits.h), therefore you can easily pack your information into some type of struct, and write that to the pipe in your child process for the parent process to read. For instance, you could setup your struct to look like:
struct message
{
int word_freq;
char word[256];
};
Then simply do a read from your pipe with a buffer that is equal to sizeof(struct message). That being said, keep in mind that it is best to only have either a single reader/writer to the pipe, or you can have multiple writers (because writes are atomic), but again, only a single reader. While multiple readers can be managed with pipes, the fact that reads are not atomic means that you could end up with scenarios where messages either get missed due to the non-deterministic nature of process scheduling, or you get garbled messages because a process doesn't complete a read and leaves part of a message in the pipe.

Resources