In my client code, I am following these steps to connect to a socket:
Creating a socket
sockDesc = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)
Connecting it (retry for 'x' time in case of failure)
connect(sockDesc, (sockaddr *) &destAddr, sizeof(destAddr))
(After filling the destAddr fields)
Using the socket for send()/recv() operation:
send(sockDesc, buffer, bufferLen, 0)
recv(sockDesc, buffer, bufferLen, 0)
close() the socket descriptor and exit
close(sockDesc)
If during send()/recv() the connection breaks, I found that I could connect by returning to step 2.
Is this solution okay? should I close the socket descriptor and return to step 1?
Another interesting observation that I am not able to understand is when
I stop my echo server and start the client. I create a Socket (step 1) and call connect() which fails (as expected) but then I keep calling connect(), lets say, 10 times. After 5 retries I start the server and connect() is successful. But during the send() call it receives SIGPIPE error. I would like to know:
1) Do I need to create a new socket every time connect() fails? As per my understanding as long as I have not performed any send()/recv() on the socket it is as good as new and I can reuse the same fd for the connect() call.
2) I don't understand why SIGPIPE is received when the server is up and connect() is successful.
Yes, you should close and go back to step 1:
close() closes a file descriptor,
so that it no longer refers to any
file and may be reused.
From here.
I think closing the socket is the right thing to do, despite the fact that it may work if you don't.
A socket which has failed to connect may not be in EXACTLY the same state as a brand new one - which could cause problems later. I'd rather avoid the possibility and just make a new one. It's cleaner.
TCP sockets hold a LOT of state, some of which is implementation-specific and worked out from the network.
Sockets corresponding to broken connection is in unstable state. normally you will not be allowed to connect to again unless the operating system release the socket.
I think it will be better to close() and connect again.. you don't have to create another socket.
Anyway, make sure to set LINGER of your socket to ensure that no data is lost in transmision.
See http://www.gnu.org/s/libc/manual/html_node/Socket_002dLevel-Options.html#Socket_002dLevel-Options
If the connection was broken and you try to write on the file descriptor you should get the broken pipe error/signal. All this is saying is that the file descriptor you tried writing to no longer has anyone on the other side to read what you are sending.
What you can do is catch the signal SIGPIPE and then deal with the reconnecting by closing the FD and going back to your step 1. You will now have a new FD you can read and write from for the connection.
If the Single UNIX Specification doesn't say that it MUST work to go back to step #2 instead of step #1, then the fact that it happens to work on Linux is just an implementation detail, and you would be far better off and more portable if you go back to step #1. As far as I am aware, the specification does not make any guarantee that it is ok to go back to step #2 and, therefore, I would advise you to go back to step #1.
Related
Let's suppose I've created a listening socket:
sock = socket(...);
bind(sock,...);
listen(sock, ...);
Is it possible to do epoll_wait on sock to wait for incoming connection? And how do I get client's socket fd after that?
The thing is on the platform I'm writing for sockets cannot be non-blocking, but there is working epoll implementation with timeouts, and I need to accept connection and work with it in a single thread so that it doesn't hang if something goes wrong and connection doesn't come.
Without knowing what this non-standard platform is it's impossible to know exactly what semantics they gave their epoll call. But on the standard epoll on Linux, a listening socket will be reported as "readable" when an incoming connection arrives, and then you can accept the connection by calling accept. If you leave the socket in blocking mode, and always check for readability using epoll's level-triggered mode before each call to accept, then this should work – the only risk is that if you somehow end up calling accept when no connection has arrived, then you'll get stuck. For example, this could happen if there are two processes sharing a listening socket, and they both try to accept the same connection. Or maybe it could happen if an incoming connection arrives, and then is closed again before you call accept. (Pretty sure in this case Linux still lets the accept succeed, but this kind of edge case is exactly where I'd be suspicious of a weird platform doing something weird.) You'd want to check these things.
Non-blocking mode is much more reliable because in the worst case, accept just reports that there's nothing to accept. But if that's not available, then you might be able to get away with something like this...
Since this answer is the first up in the results in duckduckgo. I will just chime in to say that under GNU/Linux 4.18.0-18-generic (Ubuntu 18.10).
The asynchronously accept an incoming connection using one has to watch for errno value EWOULDBLOCK (11) and then add the socket to epoll read set.
Here is a small shot of scheme code that achieves that:
(define (accept fd)
(let ((out (socket:%accept fd 0 0)))
(if (= out -1)
(let ((code (socket:errno)))
(if (= code EWOULDBLOCK)
(begin
(abort-to-prompt fd 'read)
(accept fd))
(error 'accept (socket:strerror code))))
out)))
In the above (abort-to-prompt fd 'read) will pause the coroutine and add fd to epoll read set, done as follow:
(epoll-ctl epoll EPOLL-CTL-ADD fd (make-epoll-event-in fd)))
When the coroutine is unpaused, the code proceed after the abort to call itself recursively (in tail-call position)
In the code I am working in Scheme, it is a bit more involving since I rely on call/cc to avoid callbacks. The full code is at source hut.
That is all.
We have a server with a limited on the number of incoming connections it can accept.
We have multiple clients connecting to the server at various intervals, for various different reasons.
At least one of the functions of the server requires it to process the client's request and reply back on the same socket. However:
the client complains about timing out (and I believe closes the socket)
the server finishes it's processing successfully, but the thread throws a SIGCHLD because the socket has been closed.
I have code similar to the one below, that checks the socket descriptor.
if (connect_desc > 0)
{
if (write(connect_desc, buffer, sizeof(buffer)) < 0)
{
printf("write error\n");
}
}
else
printf("connect_desc < 0\n");
My question is:
If the socket is closed by the client, would the socket descriptor change in value on the server? If not, is there any way to catch that in my code?
I'm not seeing that last print out.
Q: Will the descriptor change?
A: No
Q: How can I check the status of my connection?
A: One way is simply to try writing to the socket, and check the error status.
STRONG RECOMMENDATION:
Beej's Guide to Network Programming
Q. Will the file descriptor change?
Not unless:
It is documented somewhere you can cite.
The operating system magically knows about the disconnection.
The operating system magically knows where in your application the FD is stored, including all the copies.
The operating system wants to magically make it impossible for you to close the socket yourself.
None of these is true. The question doesn't even make sense.
Q. How can I check the status of my connection.
There isn't, by design, any such thing as the status of a TCP connection. The only way you can detect whether it has failed is by trying to use it.
I want to create a non-blocking connect.
Like this:
socket.connect(); // returns immediately
For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):
// in another thread
{
create_non_block_socket();
connect();
epoll_create();
epoll_ctl(); // subscribe socket to all events
while (true)
{
epoll_wait(); // wait a small time(~100 ms)
check_socket(); // check on EPOLLOUT event
}
}
If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.
What am I doing wrong? Maybe it can be done differently?
You should use the following steps for an async connect:
create socket with socket(..., SOCK_NONBLOCK, ...)
start connection with connect(fd, ...)
if return value is neither 0 nor EINPROGRESS, then abort with error
wait until fd is signalled as ready for output
check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
done
No loops - unless you want to handle EINTR.
If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.
It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.
There are a few ways to test if a nonblocking connect succeeds.
call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.
Ref: UNIX Network Programming V1
D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.
For those who just want the tl;dr version, the most portable way is the following:
Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.
I'm still new to C socket programming but I was able to create some simple client and server programs.
I'm programming a server that listens for TCP connections, its duty is answering to clients' requests and then close the communication when the client sends a special sequence of bytes (or when it disconnects, of course).
I started coding the server using the accept() function inside an endless loop: the server waits for a client, accept()'s it, does all the stuff, close()'s the socket descriptor at the end and goes back again waiting to accept a new client.
Since I want to serve one client at a time I called the listen function in this way: listen(mysocket, 1);
Everything worked pretty well, but then a new problem came out. The server part explained above runs in a separated thread (let's call it thread #2) and the main thread (thread #1) must be able to tell it to terminate. I created a global variable then, if this variable is set to 1 (by thread #1) thread #2 must terminate. The problem is that thread #2 gets stuck when the accept() function is called, thus it can't periodically check the global variable.
I clearly needed a timeout value for that function: "if there isn't a connection to accept, check the value of the global variable, continue waiting for new connection if set to 0 or terminate if set to 1".
I googled for a solution then and found that the select() function does the thing that I need. It's a little bit different though, I discovered for the first time the fd_set and all the FD_* macros. I modified the server part to make it work with the select() function and everything works really nice, but here comes the last problem, the one that I'm not able to solve.
If call the listen function this way: listen(socket, 1); but the server still accepts and serves multiple connections at the same time. Does this depend because select() works with fd_set's? I'm using some examples I found on the web and, when a connection is accepted, it creates a new socket descriptor that goes in the set with all the others.
I'd like to accept the connection of just one client, I wrote a simple code that recognizes if the connecting client should be served or not, but is there a way to disconnect it server side? I know that I have to use the close() function to close a socket descriptor, but when using select() I'm working with fd_set's and I don't really know what to do to close them.
Or, is there a way to limit the number of socket descriptors in a set? I found the FD_SETSIZE macro, but I wasn't able to make it work and I'm not even sure if it fixes the problem.
Thank you for your time!
The listen() function has a backlog argument that determines how many incoming request may be pending before they may be turned away. This is worded carefully so that the OS implementation can support more than what you specify in the listen() call. You may not be able to control the exact number of backlogged connections.
If you must support only one client at a time, then accept a second connection but tell the new client that the connection is not available at this time, and then close the new connection. This also has the benefit that you have the opportunity to tell the client why the connection is not available.
You can put the listen() socket in the fd_set, too. (I don't understand from your question if you already do this) If the select indicates that listen socket is readable, you can call accept on the listen socket. accept returns exactly one fd. You can put the returned fd into the fd_set, too. Threads are more or less orthogonal to this. (systemcalls are atomic), but you can of course screw things up.
The backlog parameter for listen is more or less unrelated; it just specifies how many embryonic sockets the system should maintain simultaneously (the connection buildup uses resources, even before the socket becomes usable, as indicated by listen returning the fresh fd) Once the three-way handshake is completed accept will return the fd, and the embryonic connections are born as the new socket fd.
The getdtablesize() macro is a remainder from old times. In modern implementations it is based on getrlimit().
The underlying problem with FD_SETSIZE is that fd_set must be a lvalue, such that it can be assigned. So it must be fixed size. (it is probably an array inside a struct) To avoid fixed sized structures you could use poll() instead of select() (or use multiple smaller fd_sets ;-)
if you're using the windows API , you could try Critical Sections.
In the client, I have a
close(sockfd)
where sockfd is the socket that's connected to the server.
In the server I've got this:
if (sockfd.revents & POLLERR ||
desc_set[i].revents & POLLHUP || desc_set[i].revents & POLLNVAL) {
close(sockfd.fd);
printf("Goodbye (connection closed)\n");
}
Where sockfd is a struct pollfd, and sockfd.fd is the file descriptor of the client's socket.
When the client closes the socket like I put up there, the server doesn't seem to detect it with the second code (desc_set[i].revents & POLLHUP, etc.).
Does anyone know what's the problem?
Sounds like you've managed to half close the connection from the client side. In this state the connection can still send data in one direction, i.e. it operates in half-duplex mode. This is by design and would allow your server to finish replying to whatever the client sent. Typically this would mean completing a file transfer and calling close(), or answering all of the aspects of the query. In the half-closed state you can still quite sensibly send data to the side that has already called close(). In your server you will see eof if you try to read though. close() just means "I'm done sending, finish up whatever I asked for".
POLLHUP, POLLERR and POLLNVAL only checks the output side of the local connection, which is still valid here. There's a POLLRDHUP, which is a GNU extension that should detect the other side closing, but the tests you're doing are only checking if it's still writable, not if it's still readable.
See also this question, which is talking about java, but still very related.
A remote close or output shutdown is neither an error nor a hangup nor an invalid state. It is a read event such that read() will return zero. Just handle it as part of your normal read processing.
BTW your test condition above should read sockfd.revents & (POLLERR|POLLHUP|POLLNVAL).