Unix named pipe, multiple writers or multiple pipes - c

I am currently measuring performance of named pipe to compare with another library.
I need to simulate a clients (n) / server (1) situation where server read messages and do a simple action for every written messages. So clients are writers.
My code now work, but if I add a 2nd writer, the reader (server) will never see the data and will not receive forever. The file is still filled with the non-read data at the end and read method will return 0.
Is it ok for a single named pipe to be written by multiple-process? Do I need to initialize it with a special flag for multiple-process?
I am not sure I can/should use multiple writers on a single pipe. But, I am not sure also it would be a good design to create 1 pipe for each clients.
Would it be a more standard design to use 1 named pipe per client connection?
I know about Unix Domain Name Socket and it will be use later. I need o make the named pipes work.

Related

Can single pipe be connected and read by multiple processes

From my understanding, C pipes are like a special kind of file, where internally, the kernal keep tracks of the openings and closings from each process in a table. see the post here
So in that sense:
Is it possible for 1 single pipe to be connected by multiple processes?
If it is possible, can multiple processes read the same data?
If 2 is possible, will they be reading the same data, or does reading the data "empty" the data?
For example: process 1 writes into pipe, can process 2,3,4 read the data that process 1 wrote?
Yes, multiple processes can read from (or write to) a pipe.
But data isn't duplicated for the processes. Once data has been read from the pipe by one process, it's lost and available only to the process that actually read it.
Conversely, there's no way to distinguish data or from which process it originated if you have multiple processes writing to a single pipe.
1. Is it possible for 1 single pipe to be connected by multiple processes?
Yes.
2. If it is possible, can multiple processes read the same data?
No!
Unix fifos (pipes) can not be used in "single producer, multiple consumer" (spmc) manner; this also holds for Unix Domain Sockets (for most implementations UDS and fifos are implemented by the very same code, with just a few configuration bits differing on creation). Each byte written into a pipe / SOCK_STREAM UDS (or datagram written into a SOCK_DGRAM unix domain socket) can be read from only one single reading end.
However what's perfectly possible is having a "multiple producer, single consumer" fifo, UDS, that is the consumer having open one reading end (and also keeping open the writing end, but not using it¹), multiple producers can send data to the single consumer. For stream oriented pipes there's no strict ordering, so all the bytes sent will get mixed up. But for SOCK_DGRAM UDS socketpairs message boundaries are preserved.
¹: There's a particular pitfall, that if the creating process does not keep open its instance of the writing end, as soon as any one of the producer processes closes one of their writing end, it will tear down the connection for all other processes.

Can Select() be used to detect and identify multiple stream input

I have just come across the select() function for linux (or is it Unix?) OS's. And its looking like it can achieve what I need to do.
I have a Linux process (on Debian) that has IPC (Inter-Process Communication) between 3 other processes. 2 of them are Serial Ports Streams and the other is a Named Pipe.
My process needs to read data from each of these streams and react accordingly (its a proxy between these 3 processes). Theres no order to the data coming in from each process (one may talk, then another lay silent for a while).
So I am thinking of having a main loop that simply uses select() to listen on all streams (with a timeout of never). That way select can notify me when/if a stream writes to my process, which stream is talking and then I can react accordingly.
Is this how select works? Is this design ok and how you would handle 3 streams where their behaviour is dynamic and not predictable (in terms of when they will write data to a stream)?
Yes, that's exactly what select is designed to do: multiplex multiple input streams and detect which have data ready to be read from.

Select function in socket programming

Can anyone tell me the use and application of select function in socket programming in c?
The select() function allows you to implement an event driven design pattern, when you have to deal with multiple event sources.
Let's say you want to write a program that responds to events coming from several event sources e.g. network (via sockets), user input (via stdin), other programs (via pipes), or any other event source that can be represented by an fd. You could start separate threads to handle each event source, but you would have to manage the threads and deal with concurrency issues. The other option would be to use a mechanism where you can aggregate all the fd into a single entity fdset, and then just call a function to wait on the fdset. This function would return whenever an event occurs on any of the fd. You could check which fd the event occurred on, read that fd, process the event, and respond to it. After you have done that, you would go back and sit in that wait function - till another event on some fd arrives.
select facility is such a mechanism, and the select() function is the wait function. You can find the details on how to use it in any number of books and online resources.
The select function allows you to check on several different sockets or pipes (or any file descriptors at all if you are not on Windows), and do something based on whichever one is ready first. More specifically, the arguments for the select function are split up into three groups:
Reading: When any of the file descriptors in this category are ready for reading, select will return them to you.
Writing: When any of the file descriptors in this category are ready for writing, select will return them to you.
Exceptional: When any of the file descriptors in this category have an exceptional case -- that is, they close uncleanly, a connection breaks or they have some other error -- select will return them to you.
The power of select is that individual file/socket/pipe functions are often blocking. Select allows you to monitor the activity of several different file descriptors without having to have a dedicated thread of your program to each function call.
In order for you to get a more specific answer, you will probably have to mention what language you are programming in. I have tried to give as general an answer as possible on the conceptual level.
select() is the low-tech way of polling sockets for new data to read or for an open TCP window to write. Unless there's some compelling reason not to, you're probably better off using poll(), or epoll_wait() if your platform has it, for better performance.
I like description at gnu.org:
Sometimes a program needs to accept input on multiple input channels whenever input arrives. For example, some workstations may have devices such as a digitizing tablet, function button box, or dial box that are connected via normal asynchronous serial interfaces; good user interface style requires responding immediately to input on any device. [...]
You cannot normally use read for this purpose, because this blocks the program until input is available on one particular file descriptor; input on other channels won’t wake it up. You could set nonblocking mode and poll each file descriptor in turn, but this is very inefficient.
A better solution is to use the select function. This blocks the program until input or output is ready on a specified set of file descriptors, or until a timer expires, whichever comes first.
Per the documentation for Linux manpages and MSDN for Windows,
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform the
corresponding I/O operation (e.g., read(2)) without blocking.
For simple explanation: often it is required for an application to do multiple things at once. For example you may access multiple sites in a web browser, a web server may want to serve multiple clients simultaneously. One needs a mechanism to monitor each socket so that the application is not busy waiting for one communication to complete.
An example: imagine downloading a large Facebook page on your smart phone whilst traveling on a train. Your connection is intermittent and slow, the web server should be able to process other clients when waiting for your communication to finish.
select(2) - Linux man page
select Function - Winsock Functions

Writing to multiple file descriptors with a single function call

I had a use case for a group chat server where the server had to write a common string to all clients' socket. I had then addressed this by looping through the list of file descriptors and writing the string to each of the file descriptors.
Now I am thinking of finding a better solution to the problem. Is it possible to do this by a single function call from the server by using the tee system call in linux. I want the output of one tee to go to the next tee as well to a clients socket. I am wondering if I can dup the file descriptor of one end of the tee to the clients socket and get the desired effect.
Please suggest any other implementation for the use case that you know of.
Thanks
The tee(2) system call requires both file descriptors to be pipes - so sockets do not count. The splice(2) and vmsplice(2) system calls also do not seem to meet your requirements, and I don't see how to utilize sendfile(2) either.
I've not come across such a system call. Calls for collecting diverse data and writing it all at once (or the converse for reading) - yes. But for writing to multiple outputs at once - no.
So, your current 'loop around the descriptors' is about as good as it gets, AFAICT.

Sockets & Data Persistence

This is potentially a newbie question, but if i open and write some data to a socket, then exit the subroutine so the socket goes out of scope, and then try and read the data from another program, at a later time, will the data still be there or does it die when the original declarations go out of scope ?
Thanks,
N.
Further information :
I am trying to rewrite 2 programs that use files as the interface to communicate. The general flow is :
Main Process : Write Data.
Main Process : Spawn secondary process(es) onto other nodes in a cluster
Main Process : Wait until Secondary Process finished.
Secondary Process : Read Data (written by main)
Secondary Process : Write Data
Secondary Process : exit
Main Process : Read data.
So i essentially want to replace the Write/Read/Write/Read of files with sockets (which should be much faster!)
For TCP sockets you need a bi-directional connection opened before sending data, so the question is irrelevant if you don't have a receiving side.
For UDP, if no one is listening on the socket at the time you're sending data, no one will receive it unless you manage to open a listening program fast enough for the data to be still traveling inside the networking drivers. But don't count on it, because the 'localhost loopback' inside the driver shouldn't take more than a few microseconds to deliver the data.
P.S. Perhaps you can get a more suitable answer if you describe your exact situation in more detail. What are you trying to achieve?
Regarding your "further information". You can't do this with sockets by simple replacing the files with sockets and keeping the current scheme. However, you can try to change the scheme by first spawning the child processes and only then send them the data via sockets. When the children finish, they return an answer to the parent via a socket, and exit.
There's an inefficiency here in a sense, because you have to send the same data to each child separately (unless you can use multicasting).
I'm not sure sockets will be much faster than files for you, but they will certainly be safer for more complex scheme and will also allow distribution among machines that don't share a file-system.
When using a raw socket, if there isn't another endpoint available (connected) at the time that you write the data, the data will be lost. The only way that you could actually write the data without first having connected to the other endpoint would be to use UDP, in which case the data would simply be flushed by the receiving system if no matching endpoint is available.
If you want to have asynchronous delivery you will need to use a message passing system that allows delayed delivery. In this case, the receiver of the message is actually a system process that stores the message until a client requests it. The actual communication takes place between a client on one system and the system process on the other, with the client on the other system obtaining the data locally. You can read more about message passing and its variants at http://en.wikipedia.org/wiki/Message_passing.

Resources