Reliably broadcasting the same data to multiple sockets in C - c

I'm receiving a large amount of data over a TCP socket which I'm looking to send to a number of additional sockets (essentially echoing the contents of the first socket). My requirements are as follows:
Any data received over the read socket must be sent reliably (in order of arrival) to the write sockets
Writes must not block reads from the primary socket under any circumstances
Disconnections/broken pipes should never interrupt reading from the inbound socket
Currently I'm using a multi-threaded system that maintains a linked list of data read from the inbound socket, and have threads for each outbound socket. When messages arrive via the inbound socket I signal the outbound socket threads to create a copy of the linked list from the main thread and broadcast the data without interrupting operations. My linked list structure maintains an fd_set of file descriptors on the main thread, which it uses to determine when a message has been successfully copied and broadcast by each outbound socket (thus removing it from the list).
This is obviously a lot of moving parts, and I wanted to see if I was missing something fundamental that could be used in a situation such as this.

What you want to do is similar to what I've been doing for the past two decades.
Don't lock and copy the entire list. Instead, have each sending thread read-lock just the element it's currently sending. Have the receiving thread write-lock each new element and perform element deletion. Allow for a long list to accommodate transient bursts.
You can make the list persistent by memory-mapping it to a file.
Good luck. The task is non-trivial.

Related

How to check if a socket is in use in c, to perform multiple simultaneous writes on single socket

I'm trying to write and read through a socket using TCP connection. Many workers write to the same socket. But they can't write at the same time. So one of them can write each time and the rest of them wait. How can I know if something is being written in the socket at the moment? Is there any system call. Thanks in advance!
If you want high performance, what you should do instead is to have a single thread which manages the socket, and use in-memory queues to publish data from the workers to the socket manager. This can be done lock-free without too much trouble, and it may improve throughput if your workers can chew on other tasks instead of waiting for their turn to use the socket.
A potentially interesting alternative is to use UDP instead of TCP. Then the senders probably do not need to synchronize at all, since it's message-oriented (i.e. messages won't be partially sent if they are short enough). You can even use multiple sockets to write to the same UDP address. But attempting this would require a lot of care and consideration of what's reading on the other end.
If you can send everything in a single send() or sendmsg() what you're asking for isn't even necessary. These calls (a) are atomic and (b) don't return until all the data has been transferred from the application to the kernel, in blocking mode.
TCP/UDP sockets can be used simultaneously (in parallel threads) for read and write operations. That means read operation in one thread and write operation in other thread. But Doing same operation (read or write) on one thread simultaneously is not applicable. That means multiple threads doing write (or read) operation on a single socket is not thread safe without any lock on socket.
So if you want multiple workers to write on a single socket simultaneously means you need to create threads for each workers and you need to lock the socket before going for write operation. This requires n threads for n workers.

Multithreaded udp server

I'm new with threads and please if you can advice me. I have server that broadcast messages
to clients. Then clients sends back replies to server. I want to handle every reply using
seperate thread. Every reply will have mesage id and thread id. How can I fill some structure
with this information from all threads and then read it
Also from my code, is it correctly to create thread in while or does it exist someway
to create thread just if I get reply from client?
Did I start with correct understanding?
int main(){
while(1){
sendto on broadcast IP
pthread_create(&cln_thread, NULL, msg_processing, (void *) &arg))
}
}
msg_processing () {
recvfrom client msg with id of packet and thread id
how to write those informations and then read them in main when program finish
}
Thank you
Err.. no, just create ONE thread, once only, for receiving datagrams on one socket. In the thread function, receive the datagrams in a while(true) loop. Don't let this receive thread terminate and don't create any more receive threads. Continually creating/terminating/destroying threads is inefficient, dangerous, unnecessary, error-prone, difficult-to-debug and you should try very hard to not do it, ever.
Edit:
One receive thread only - but you don't have to do the processing there. Malloc a 64K buffer, receive data into it, push the buffer pointer onto a producer-consumer queue to a pool of threads that will do the processing, loop back and so malloc again to reseat the pointer and create another buffer for the next datagram. Free the *buffers in the pool threads after the processing is completed. The receive thread will be back waiting for datagrams quickly while the buffer processing runs concurrently.
If you find that the datagrams can arrive so rapidly that the processing cannot keep up, memory use will grow unchecked as more and more *buffers pile up in the queue. There are a couple ways round that. You can use a bounded queue that blocks if its capacity is reached. You could create x buffers at startup and store them on another producer-consumer 'pool queue' that the receive thread pops from, (instead of the malloc) - the processing pool threads can then push the 'used' *buffers back on to the pool queue for re-use. If the pool runs out, the receive thread will block on the pool until *buffers are returned.
I prefer the pooled-buffer approach since it caps memory-use across the whole system, avoids continual malloc/free with its fragmentation issues etc, avoids the more complex bounded queues, is easier to tweak, (the pool level is easy to change at runtime), and is easier to monitor/debug - I usually dump the pool level, (ie. queue count), to the display once a second with a timer.
In either case, datagrams may be lost but, if your system is overloaded such that data regularly arrives faster than it can possibly be processed, that's going to be the case anyway no matter how you design it.
One socket is fine, so why complicate matters? :)
You can find a good example of a multithreaded UDP server in Go lang following:
https://gist.github.com/jtblin/18df559cf14438223f93
The main idea is to use multi-core functionality in a full so each thread works on its own CPU core and reads UDP data into the single buffer for processing.

UDP Multiple sockets Receive data and process efficiently - C & Linux

I have to receive data from 15 different clients each of them sending on 5 different ports. totally 15 *5 sockets.
for each client port no is defined and fixed. example client 1 ,ports 3001 to 3005. client 2 ,ports 3051 to 3055 etc. They have one thing in common say first port (3001 , 3051) is used to send commands. other ports send some data.
After receiving the data i have to check for checksum. keep track of recvd packets, Re request the packet if lost and also have to write to files on hard disk.
Restriction I cannot change the above design and i cannot change from UDP to TCP.
The two methods i'm aware of after reading are
asynchronous multiplexing using select().
Thread per socket.
I tried the first one and i'm stuck at the point when i get the data. I'm able to receive data. I have some processing to do so i want to start a thread for each socket (or) for sockets to handle (say all first ports, all second, etc ..i.e.3001,3051 etc)
But here if client sends any data then FD_ISSET becomes true , so if i start a thread ,then it becomes thread for every message.
Question:
How to add thread code here, Say if i include pthread_create inside if(FD_ISSET .. ) then for every message that i receive i create a thread. But i wanted a thread per socket.
while(1)
{
int nready=0;
read_set = active_set;
if((nready = select(fdmax+1,&read_set,NULL,NULL,NULL)) == -1)
{
printf("Select Errpr\n");
perror("select");
exit(EXIT_FAILURE);
}
printf("number of ready desc=%d\n",nready);
for(index=1;index <= 15*5;index++)
{
if(FD_ISSET(sock_fd[index],&read_fd_set))
{
rc = recvfrom(sock_fd[index],clientmsgInfo,MSG_SIZE,0,
(struct sockaddr *)&client_sockaddr_in,
&sockaddr_in_length);
if(rc < 0)
printf("socket %d down\n",sock_fd[index]);
printf("Recieved packet from %s: %d\nData: %s\n\n", inet_ntoa(client_sockaddr_in.sin_addr), ntohs(client_sockaddr_in.sin_port), recv_client_message);
}
} //for
} //while
create the threads at the startup of program and divide them to handle data, commmands e.t.c.
how?
1. lets say you created 2 threads, one for data and another for the commands.
2. make them sleep in the thread handler or let them wait on a lock that the main thread
acquired, seems to be that mainthread got two locks one for each of them.
3. when any client data or command that got into the recvfrom at mainthread, depending on the
type of the buffer(data, commands), copy the buffer into the shared data by mainthread and
other threads and unlock the mutex.
4. at threads lock the mutex so that mainthread wont' corrupt the data and once processing is
done at the threads unlock and sleep.
The better one would be to have a queue, that fills up by main thread and can be accessed element wise by the other threads.
I assume that each client context is independent of the others, ie. one client socket group can be managed on its own, and the data pulled from the sockets can be processed alone.
You express two possibilities of handling the problem:
Asynchronous multiplexing: in this setting, the sockets are all managed by one single thread. This threads selects which socket must be read next, and pulls data out of it
Thread per socket: in this scenario, you have as many threads as there are sockets, or more probably group of sockets, ie. clients - this the interpretation I will build from.
In both cases, threads must keep ownership of their respective resources, meaning sockets. If you start moving sockets around between threads, you will make things more difficult that it needs to be.
Outside the work that needs to be done, you will need to handle thread management:
How do threads get started?
How and when are they stopped?
What are the error handling policies?
Your question doesn't cover these issues, but they might play a significant role in your final design.
Scenario (2) seems simpler: you have one main "template" (I use the word in a general meaning here) for handling a group of sockets using select on them, and in the same thread receive and process the data. It's quite straightforward to implement, with a struct to contain the context specific data (socket ports, pointer to function for packet processing), and a single function looping on select and process, plus perhaps some other checks for errors and thread life management.
Scenario (1) requires a different setup: one I/O thread reads all the packets and pass them on to specialized worker threads to do the processing. If processing error occurs, worker threads will have to generate the adhoc packet to be sent to the client, and pass it to the I/O thread for sending. You will need packet queues both ways to allow communication between I/O and workers, and have the I/O thread check the worker queues somehow for resend requests. So this solution is a bit more expensive in terms of developement, but reduce the I/O contention to one single point. It's also more flexible, in case some processing must be done against data coming from several clients, or if you want to chain up processing somehow. For instance, you could have instead one thread per client socket, and then one other thread per client group of socket further down the work pipeline, with each step of the pipeline interconnected by packet queue.
A blend of both solution can of course be implemented, with one IO thread per client, and pipelined worker threads.
The advantage of both outlined solutions is the fixed number of threads: no need to spawn and destroy threads on demand (although you could design a thread pool to handle that as well).
For a solution involving moving sockets between threads, the questions are:
When should these resources be passed on? What happens after a worker thread has read a packet? Should it return the socket to the IO thread, or risk a blocking read on the socket for the next packet? If it does a select to poll the socket for more packets, we fall in scenario (2), where each client will has its own I/O thread when there is network trafic from all of them, in which case what is the gain of the initial I/O thread doing the select?
If it passes the socket back, should the IO thread wait for all workers to give back their socket before initiating another select? If it waits, it takes the risk of making unserved client wait for packets already in the network buffers, inducing processing lag. If it does not wait, and return to select to avoid lag on unserved sockets, then the served ones will have to wait for the next wake up to see their sockets back in the select pool.
As you can see, the problem is difficult to handle. That's the reason why I recommend exclusive sockets ownership by threads as described in scenarii (1) and (2).
Your solution requires a fixed, relatively small, number of connections.
Create a help procedure that creates thread procedures that listen to each of the five ports and block on the recvfrom(), process the data, and block again. You can then call the helper 15 times to create the threads.
This avoids all polling, and allows Linux to schedule each thread when the I/O completes. No CPU used while waiting, and this can scale to somewhat larger solutions.
If you need to scale massively, why not use a single set of ports, and get the partner address from the client_sockaddr_in structure. If the processing takes a material amount of time, you could extend it by keeping a pool of threads available and assign a new one each time a message is received and continue processing the message thereafter, and adding the thread back to the pool after the response is sent.

Distinguishing file descriptor types in epoll

I have a network client library that I'm putting together that reads/writes to some network sockets.
There is a single thread that does the network I/O and responds to requests from the exposed client API. Those client API requests are to be popped off a FIFO queue.
In order for the thread to get at the request, when my main loop is blocked on epoll_wait
I am thinking I should use an eventfd which I can add with epoll_ctl.
So the question is how can I distinguish between an event pushed onto my FIFO queue and network I/O if epoll just notifies with EPOLLIN?
EDIT:
I should add that I am not wanting to store the event fd in the data member, but rather use the ptr member. I suppose I need to store the fd somewhere inside that structure.
Can I simply check to see if the triggered event = my event file descriptor and therefore read from my fifo as well, and if it's not equal then it must be a network event? Is this safe? or is there a best practice approach.
Yes you have to compare the file descriptors. The example in the manual page does this.

Multicast program lost data

I've a multithread program written in C, one thread is receiving multicast data from the network and store it in a queue, another thread keep reading the queue and write it to file. Everything work just great i.e. no data lost from the multicast network.
Thread 1: Read Multicast data and store it into a queue
Thread 2: Read from queue and write it to file.
now I have another source of multicast data from network, I need another thread to read the network data, then I just go and add a for loop to create another thread for multicast data, then when the 2 multicast threads switching back and forth, I lost data from the multicast network!
Anyone has idea about why there are lost datagrams if 2 threads are used. Thanks
It is likely you are not using any concurrency mechanisms like semaphore or mutexs. A classic solution is a monitor. A monitor provides a lock to mediate concurrent access and condition signals to allow independent processes blocking access (no busy waiting). In plain English this means that only one thread can access the data at a time. This prevents a reading thread from reading data that the writing thread has not yet finished writing. It also allows the reading threads to read data that the other reading thread has not yet read. An approach to implement this is to use a read-write mutex and an access semaphore. Each thread that wants to access the data decrements the access semaphore, the thread will either be granted access or sleep until it gets it turn. The read-write mutex will prevent a read thread from reading until some data has been written.

Resources