pthread and kinda "broadcast stream" - c

What is the best way to have some shared stream with data for all threads?
If i have a threads interacting with each user connection, and then every users input must be available for all threads. We can imagine like a simple chat, where everyone sees everyones messages.
So i though that i can use some kind of "shared stream", which i can use for some kind of select() between this stream and users input socket, to write there when i got an input and read from there when there is something new available. I though about having some shared socket, but it wan't work this way, because when first thread will read data from socket, it wan't be available for othrer threads anymore.
So what it the best and idiomatic way to achieve this?

I think youy might be taking the engineering here a bit too far..
what you are looking for is an SMP like this or this of some sort, not a stream. The "stream" concept could be handled by a different process (a manager process, if you will) that will handle that stream of incoming information. In the chat scenario you described, it's not neccery because each thread can add to the SMP whatever is received on its own input stream.

Related

How do I implement event driven POSIX threads?

I'm coding for a linux platform using C. Let's say I have 2 threads. A and B.
A is an infinite loop and constantly trying to find out if there is data on the socket localhost:8080, where as B is a thread that spends most of its time in a blocked state until A calls mutex unlock function on a mutex that B uses to block itself. A will unlock B when it received appropriate data on the socket.
So you see here is a problem. B is "event driven" largely whereas A is in a constant running state. My target platform isn't resource rich so I wish A could be "activated" and enter running state only when it received data on socket, instead of constantly looping.
So how can I do that? If it matters - I wish to do this for both UDP and TCP sockets.
There are Multiple was of doing what you want in a clean was. One approach, you are kind of using already, is a event system. A real event system would be overkill for the kind of problem you are dealing with, but can be found here. This is a (random) better implementation, capable of listening for multiple file descriptors and time based events, all in a single thread.
If you want to build one yourself, you should take a look at the select or poll function.
But I agree with #Jeremy Friesner, you should definitely use the functions made for socket programming, they are perfect for your kind of problem. Only use the event system approach if you really need it (with multiple sockets/timed events).
You simply call recv (or recvfrom, recvmsg, etc) and it doesn't return until some data has been received. There's no need to "constantly try to find out if there is data" - that's silly.
If you set the socket to non-blocking mode then recv will return even if there's no data. If that's what you're doing, then the solution is simple: don't set the socket to non-blocking mode.

How to return from a select with SIGINT

I need your help to solve this problem.
I have to create a multi-threaded client-server program on unix, based on AF_UNIX
sockets, that must handle up to some thousands simultaneous connections and also must do different things based on the type of signal received, like shutdown when server receives a SIGINT.
I thought of doing this disabling, initially, SIGINT and the other signals from the main's thread sigmask, then starting up a dispatching thread, that keeps (I know that's really inefficient this) waiting on select() for I/0 requests, accepts the new connection and then reads exactly sizeof(request) bytes, where request is a well-known structure, then creating also a thread that handles the signals received, the only one that re-enables the signals, using sigwait(), and finally starting up the other server thread to execute the real work.
I have this questions:
I would like to let select() return even if the dispatcher thread is stuck in it. I've red of a self-pipe trick about this, but I think I made it wrong, because even if I let the signal-handling thread write in the pipe that's in the select's read set, select() won't return. How could I let select() return?
I've read something about epoll(), that's the efficient to handle many simultaneous connections efficiently. Should i, and if how, use it? I can't figure it out only reading man epoll, and on my text book it's not even mentioned.
There are some good practices that I could use for handling system's failures? I almost check every system call's return value to, eventually, handle the error to free memory and other stuff like this, but my code keeps growing a lot, and almost for the same operations repeated many times. How could I write a cleanup function that could free memory before returning with abort()?
Anyway, thanks a lot in advice for your help, this platform is really amazing, and when I'll get more expert, I'll pay the community back giving my help!
(Sorry for my English, but it's not my mother language)

The most efficient way to manage multiple socket(maximum 50 sockets.) in a single process?

I'm trying to implement Bittorrent client. in order to receive pieces from different peers, The client should manage multiple socket.
Well-known solution that I know are
1. Each thread has one socket.
2. Using select() call, non-blocking I/O.
3. a mix of 1 and 2.
The first solution requires too many threads. The second solution wastes CPU time since it continue to checks maximum 50 socket. Also, when deciding to use the third solution, I don't know how many threads a single process use.
Which solution is the best one, to receive a fairly large file?
Is there any web page that give me a good solution?
Any advice would be awesome.
Some High Level Ideas from my side. : )
Have a main thread in which you will be doing the "select" / "poll" call for all the connections.
Have a thread pool of worker threads
If for a particular connection, select indicates that there is data to read, then pass the socket + additional information to one of the free worker threads for receiving / sending data on that connection.
Upon completion of the work, the worker thread returns to the free worker thread queue, which can be used again for another connection.
Hope this helps
You're right, the first solution is the worst.
The second one, with select() can do the job, but there's a problem: select() has a complexity of log(n). You should use /dev/poll, epoll(), kqueue() or whatever, but don't use select().
Don't use one thread per socket !! You will loose a lot of time due to the context switch.
You should have:
A Listener thread : just do all the accept and put the new socket
in a Worker thread.
Multiple Worker thread: do all the other stuff. It will check if there's data available and will handle it. A Worker thread manage many sockets.
Take a look at the Kegel's c10k page if you want more informations.
Check some Open Source BitTorrent client and check the code to get some ideas, it is the best thing you could do.
I recommend you to check BitTorrent in C or Hadouken in C# for example:
https://github.com/bittorrent
https://github.com/hadouken/hdkn

Select function in socket programming

Can anyone tell me the use and application of select function in socket programming in c?
The select() function allows you to implement an event driven design pattern, when you have to deal with multiple event sources.
Let's say you want to write a program that responds to events coming from several event sources e.g. network (via sockets), user input (via stdin), other programs (via pipes), or any other event source that can be represented by an fd. You could start separate threads to handle each event source, but you would have to manage the threads and deal with concurrency issues. The other option would be to use a mechanism where you can aggregate all the fd into a single entity fdset, and then just call a function to wait on the fdset. This function would return whenever an event occurs on any of the fd. You could check which fd the event occurred on, read that fd, process the event, and respond to it. After you have done that, you would go back and sit in that wait function - till another event on some fd arrives.
select facility is such a mechanism, and the select() function is the wait function. You can find the details on how to use it in any number of books and online resources.
The select function allows you to check on several different sockets or pipes (or any file descriptors at all if you are not on Windows), and do something based on whichever one is ready first. More specifically, the arguments for the select function are split up into three groups:
Reading: When any of the file descriptors in this category are ready for reading, select will return them to you.
Writing: When any of the file descriptors in this category are ready for writing, select will return them to you.
Exceptional: When any of the file descriptors in this category have an exceptional case -- that is, they close uncleanly, a connection breaks or they have some other error -- select will return them to you.
The power of select is that individual file/socket/pipe functions are often blocking. Select allows you to monitor the activity of several different file descriptors without having to have a dedicated thread of your program to each function call.
In order for you to get a more specific answer, you will probably have to mention what language you are programming in. I have tried to give as general an answer as possible on the conceptual level.
select() is the low-tech way of polling sockets for new data to read or for an open TCP window to write. Unless there's some compelling reason not to, you're probably better off using poll(), or epoll_wait() if your platform has it, for better performance.
I like description at gnu.org:
Sometimes a program needs to accept input on multiple input channels whenever input arrives. For example, some workstations may have devices such as a digitizing tablet, function button box, or dial box that are connected via normal asynchronous serial interfaces; good user interface style requires responding immediately to input on any device. [...]
You cannot normally use read for this purpose, because this blocks the program until input is available on one particular file descriptor; input on other channels won’t wake it up. You could set nonblocking mode and poll each file descriptor in turn, but this is very inefficient.
A better solution is to use the select function. This blocks the program until input or output is ready on a specified set of file descriptors, or until a timer expires, whichever comes first.
Per the documentation for Linux manpages and MSDN for Windows,
select() and pselect() allow a program to monitor multiple file
descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform the
corresponding I/O operation (e.g., read(2)) without blocking.
For simple explanation: often it is required for an application to do multiple things at once. For example you may access multiple sites in a web browser, a web server may want to serve multiple clients simultaneously. One needs a mechanism to monitor each socket so that the application is not busy waiting for one communication to complete.
An example: imagine downloading a large Facebook page on your smart phone whilst traveling on a train. Your connection is intermittent and slow, the web server should be able to process other clients when waiting for your communication to finish.
select(2) - Linux man page
select Function - Winsock Functions

Nonblocking sockets with Select

I do not understand what the difference is between calling recv() on a non-blocking socket vs a blocking socket after waiting to call recv() after select returns that it is ready for reading. It would seem to me like a blocking socket will never block in this situation anyway.
Also, I have heard that one model for using non blocking sockets is try to make calls (recv/send/etc) on them after some amount of time has passed instead of using something like select. This technique seems slow and wasteful to be compared to using something like select (but then I don't get the purpose of non-blocking at all as described above). Is this common in networking programming today?
There's a great overview of all of the different options for doing high-volume I/O called The C10K Problem. It has a fairly complete survey of a lot of the different options, at least as of 2006.
Quoting from it, on the topic of using select on non-blocking sockets:
Note: it's particularly important to remember that readiness notification from the kernel is only a hint; the file descriptor might not be ready anymore when you try to read from it. That's why it's important to use nonblocking mode when using readiness notification.
And yes, you could use non-blocking sockets and then have a loop that waits if nothing is ready, but that is fairly wasteful compared to using something like select or one of the more modern replacements (epoll, kqueue, etc). I can't think of a reason why anyone would actually want to do this; all of the select like options have the ability to set a timeout, so you can be woken up after a certain amount of time to perform some regular action. I suppose if you were doing something fairly CPU intensive, like running a video game, you may want to never sleep but instead keep computing, while periodically checking for I/O using non-blocking sockets.
The select, poll, epoll, kqueue, etc. facilities target multiple socket/file descriptor handling scenarios. Imagine a heavy loaded web-server with hundreds of simultaneously connected sockets. How would you know when to read and from what socket without blocking everything?
If you call read on a non-blocking socket, it will return immediately if no data has been received since the last call to read. If you only had read, and you wanted to wait until there was data available, you would have to busy wait. This wastes CPU.
poll and select (and friends) allow you to sleep until there's data to read (or write, or a signal has been received, etc.).
If the only thing you're doing is sending and receiving on that socket, you might as well just use a non-blocking socket. Being asynchronous is important when you have other things to do in the meantime, such as update a GUI or handle other sockets.
For your first question, there's no difference in that scenario. The only difference is what they do when there is nothing to be read. Since you're checking that before calling recv() you'll see no difference.
For the second question, the way I see it done in all the libraries is to use select, poll, epoll, kqueue for testing if data is available. The select method is the oldest, and least desirable from a performance standpoint (particularly for managing large numbers of connections).

Resources