I am writing a socket program which consists of a bunch of slave processes that will be sitting on each machine in a cluster of computers, while a master process instructs them to move local files over to remote slaves on remote nodes. The primary task of these slave processes is to read files off their local hard disks and transfer them to other slaves on other machines. I want to have this functionality of both listening for file data and sending over file data built into a single process.
Is it possible that I can have both the sending and the receiving bits in the same process ?
//I want this to send a connect() request to every other slave node
initialize_Connections();
//Have an accept() call for accepting the connection requests from the other nodes
accept_Connections();
Is it even possible to pull off something like this ? I looked at forking the process between the initialize_Connections() and the accept_Connections() call (ie a child process calls the initialize_Connections() and the parent takes care of the accept_Connections()) but that did not work out for some mysterious (to me) reason.
Is it possible to use nonblocking connect() and accept() in this situation ?
using threads is not mandatory, you just need to setup one listening socket that fires a socket for each incoming connection and a bunch of sockets for the connections versus the other clients, and pool/select on every socket for events...
The select() call allows you service multiple sockets without entering blocking calls. It requires you to set up some structures that get populated by select(). You call select() iteratively, examining the structures to see if reading &/or writing to one of the specified sockets would succeed immediately. Then, you can call read() or write() on an open socket without fear that it may block indefinitely.
Is it possible that I can have both the sending and the receiving bits
in the same process? Is it possible to use nonblocking connect() and accept() in this
situation ?
Yes, absolutely. You can do it with multiple threads, or with just a single thread using select() or poll() to multiplex non-blocking I/O across multiple sockets.
Neither of these approaches is trivial to get 100% right, but they are quite doable. If you'd like to avoid spending a bunch of time learning (and debugging) multiple-connection socket programming details, I'd recommend using some middleware that does this sort of thing for you, rather than rolling your own. I'm partial to my own library that I wrote to handle this sort of thing, but there are other good ones out there as well.
I would have each slave process execute on two permanent threads. On one thread, the process would connect to the master process and then receive instructions on which file to send to which other slave processes. For each other slave process in an instruction, this thread would create a temporary thread to send the file. On the other permanent thread, the slave process would receive connections from other slave process. For each connection it would create a temporary thread on which the file was received.
Related
I'm currently implementing a daemon server that acts as 2 servers. One of the servers is recieving logs via UDP from a collection of producers. The second server is broadcasting every log that was received from a producer to a consumer who is currently connected via TCP.
These are 2 separte sockets. My current(pretty basic) implementation is to use select() on these 2 sockets, and handle every read signal accordingly, so my code is basicly(NOTE this is pseudo code)
for(;;) {
FDSET(consumers_server)
FDSET(producers_server)
select()
if consumers_server is set:
add new client to the consumers array
if producers server is set:
broadcast the log to every consumer in the array
}
This works just fine, the problem ocurres when this code is put in to stress. When multiple produers are sending logs(UDP) the real bottleneck here is the consumers which are TCP. Sending a log to the consumers can result in blocking, which i can't afford.
I've tried using non-blocking sockets and select()ing the consumers write fds, the problem is this would result in saving the non-sent logs in a buffer, until they can be sent. This results in a very unelegant massive code, and the system is also low on resources(mainly RAM)
I'm running on a linux distro.
An alternative approach to synchronize between these UDP and TCP connections would be welcomed.
This is doomed to failure. Sooner or later you will be unable to send to the TCP consumer. Whether that manifests itself as blocking or EAGAIN/EWOULDBLOCK isn't really relevant to the underlying problem, which is that the producer is overrunning the consumer. You have to decide what to do about that. You can have a certain amount of internal buffering but at some point you will have to stop reading from the UDP producers. At that point, UDP datagrams will be dropped and your system will lose data, and of course it is liable to lose data anyway by virtue of using UDP.
Don't do this. Use TCP for the producers: or else just accept the data loss and use blocking mode. Non-blocking mode only moves the problem slightly and complicates your code.
I have two questions regarding using sockets for client server communication. Assume there is only 1 client in both the cases.
1) I know that we can send and receive data between client and server using a single socket. But in that case, what will happen when both the server and the client try to send the data at the same time?
2) Which of these is the best model?
i) Using single thread, single socket for sending and receiving
ii) Using 2 threads(one for sending and one for receiving), single socket
iii) Using 2 sockets and 2 threads, one for sending and one for receiving.
The connection is full-duplex, meaning that sends and receives can happen at the same time. So in answer to question one, both client and server will be able to send/read data from their socket simultaneously.
In terms of which "model" is best, it depends on your application and what you're trying to achieve. Incidentally, you don't need to multi-thread. You could:
Multi-process (fork)
Use non-blocking sockets (select/poll)
Use asynchronous notification (signals)
All of which have pros and cons.
For question number one, nothing special will happen. TCP is fully duplex, both sides of a connection can send simultaneously.
And as there is no problem with sending/receiving simultaneously, the first alternative in your second question is going to be the simplest.
In that scenario you dont need threads. The sockets themselves buffer the incoming data until you read it from the file-descriptor. More precisely there are multiple levels of buffers starting at the hardware. You will not miss data because you were writing at the same time, it just waits for you until you next read from the socket's file-descriptor.
There is no inherent need for multithreading if you want to poll at multiple sockets.
All you really have to do is use select().
To achieve this you define a FD_SET (File-Descriptor Set) to which you add all the sockets you want to poll. This set you hand to select() and it will return you all file descriptors with pending data.
man page for select, fd_set and a select tutorial
I am in the middle of a multi-threaded TCP server design using Berkely SOCKET API under linux in system independent C language. The Server has to perform I/O multiplexing as the server is a centralized controller that manages the clients (that maintain a persistent connection with the server forever (unless a machine on which client is running fails etc)). The server needs to handle a minimum of 500 clients.
I have a 16 core machine, what I want is that I spawn 16 threads(one per core) and a main thread. The main thread will listen() to the connections and then dispatch each connection on the queue list to a thread which will then call accept() and then use the select() sys call to perform I/O multiplexing. Now the problem is how do I know that when to dispatch a thread to call accept() . I mean how do I find out in the main thread that there is a connection pending at the listen() so that I can assign a thread to handle that connection. All help much appreciated.
Thanks.
The listen() function call prepares a socket to accept incoming connections. You then use select() on that socket and get a notification that a new connection has arrived. You then call accept on the server socket and a new socket id will be returned. If you like you can then pass that socket id onto your thread.
What I would do is have a single thread for accepting connections and receiving data which then dispatches the data to a queue as a work item for processing.
Note that if each of your 16 threads is going to be running select (or poll, or whatever) anyway, there is no problem with them all adding the server socket to their select sets.
More than one may wake when the server socket has in incoming connection, but only one will successfully call accept, so it should work.
Pro: easy to code.
Con:
naive implementation doesn't balance load (would need eg. global
stats on number of accepted sockets handled by each thread, with
high-load threads removing the server socket from their select sets)
thundering herd behaviour could be problematic at high accept rates
epoll or aio/asio. I suspect you got no replies to your earlier post because you didn't specify linux when you asked for a scalable high-performnce solution. Asynchronous solutions on different OS are implemented with substantial kernel support and linux aio, Windows IOCP etc. are different enough that 'system independent' does not really apply - nobody could give you an answer.
Now that you have narrowed the OS down to linux, look up the appropriate asynchronous solutions.
I'm trying to implement a simple web server on Linux that connects to the client (the browser) ,receives some requests from the client (e.g GET), and then sends back the response with desired file. I am using a socket communication. I want to create a pool of worker processes (children) at server startup whose job is to deal with the incoming requests. The parent process has to accept() the incoming request and sends their file-desciptor to one of the worker processes to deal with it and sends the response to the client with the requested file.
The problem that I have faced is that when I accept() the request and send it to the worker process, recv() or read() functions return -1 which means an error occurs:
Socket operation on non-socket
But when I try to use recv() or read() functions at the parent process, they work very well and return the number of received bytes.
How can I solve this issue?
PS: I am using the shared memory to pass the file-desciptor from the parent process to the worker process (the child process), and I am using semaphores to manage which worker process will handle the request
EDIT:
Actually, it's a project assignment and one of the specification is to send the file-descriptor via the shared memory. However, can I send a pointer of the file-descriptor?
You can't send file descriptors via shared memory, AFAIK. So what your're doing is essentially sending a small integer to a worker process.
What you can do, is send the file descriptor over a Unix domain socket, using sendmsg and ancilliary data. It sounds a bit like magic (heck, it is a bit like magic) but it's pretty standard among Unixes so it should work.
Is your worker another process or just another thread? I'm quite sure your descriptors won't be valid outside the process creating/owning them. It should be a bit like with local pointers. They're valid for the process that created them only (due to virtual memory addressing and stuff).
Maybe beter way is to create the server socket before fork,
and let child processes to call accept().
So parent process does not need to deliver accepted sockets to childs.
When several childs are waiting connections from the same port, and a client opens connection to the port, then the kernel will give the new connection to one of the child processes.
Okay I'm brand new to socket programming and my program is not behaving like I'd expect it to. In all the examples that I see of socket programming they use accept() and all the code after assumes that a connection has been made.
But my accept() is called as soon as I start the server. Is this supposed to happen? Or is the server supposed to wait for a connection before executing the rest of the program?
EDIT: Oops I forgot to mention it is a TCP connection.
I think this is what you're after.
http://www.sockets.com/winsock.htm#Accept
The main concept within winsocket programming is you're working with either blocking or non blocking sockets. Most of the time if you're using blocking sockets you can query the sockets recieve set to see if any call would result in your call to the routine being blocked..
For starting off with this UDP is easier considering its a datagram protocol. TCP on the other hand is a streaming protocol. So it's easier to think in regards to blocks of data that is sent and received.
For a server, you:
Create the socket - socket().
Bind it to an address.
You enter a loop in which you:
Listen for connection attempts
Accept and process them
It is not clear from your description whether you are doing all those steps.
There are multiple options for the 'process them' phase, depending on whether you plan to have a single-threaded single process handle one request before processing the next, or whether you plan to have a multi-threaded single process, with one thread accepting requests and creating other threads to do the processing (while the one thread waits for the next incoming connection), or whether you plan to have the process fork with the child processing the new request while the parent goes back to listening for the next request.
You are supposed to enter your acceptance loop after you have started listening for connections. Use select() to detect when a pending client connection is ready to be accepted, then call accept() to accept it.