So, I have a client that attempts to connect with a server. The ip and port are retrieved from a configuration file. I need the program to fail smoothly if something in the config file is incorrect. I connect to the server using the following code
if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
perror("client: connect");
close(sockfd);
continue;
}
If the user attempts to connect to a server on the subnet that is not accepting connections (i.e. is not present), then the program fails with No route to host. If the program attempts to connect to a server that is not on the subnet (i.e. the configuration is bad), then the program hangs at the connect() call. What am I doing incorrectly? I need this to provide some feedback to the user that the application has failed.
You're not doing anything wrong. TCP is designed for reliability in the face of network problems, so if it doesn't get a response to its initial connection request, it retries several times in case the request or response were lost in the network. The default parameters on Linux result in it taking about a minute to give up. Then it will report a failure with the Connection timed out error.
If you want to detect the failure more quickly, see C: socket connection timeout
Normally we don't use continue inside an if statement, unless the if statement is inside a loop, that you are not showing. Assuming there is an outer loop, this would be responsible for what happens next .. either keeps re-entering the if block ( to try to connect again) or skipping past it.
Note also you are closing sockfd inside the if block so if your loop is re-entering the if block to do retries, then it needs to create a new socket first.
I suggest reading some sample code for client and server side socket connections to get a better feel for how it works http://www.cs.rpi.edu/~moorthy/Courses/os98/Pgms/socket.html
If all fails, please provide the code around the if block and also state how you want to "fail smoothly". One way to fail "abruptly' would be to swap the continue statement with a call to exit() "-)
EDIT: After reading Barmar's answer and his comment you also need to be aware of this:
If the initiating socket is connection-mode, then connect() shall
attempt to establish a connection to the address specified by the
address argument. If the connection cannot be established immediately
and O_NONBLOCK is not set for the file descriptor for the socket,
connect() shall block for up to an unspecified timeout interval until
the connection is established. If the timeout interval expires before
the connection is established, connect() shall fail and the connection
attempt shall be aborted.
also..
If the connection cannot be established immediately and O_NONBLOCK is
set for the file descriptor for the socket, connect() shall fail and
set errno to [EINPROGRESS], but the connection request shall not be
aborted, and the connection shall be established asynchronously.
Subsequent calls to connect() for the same socket, before the
connection is established, shall fail and set errno to [EALREADY]
When you say "the program hangs" did you mean forever or for a period that might be explained by a TCP/IP timeout.
If this and Barmar's answer are still not enough, then it would help to see the surrounding code as suggested and determine if blocked or non-blocked etc.
Related
I'm working with a fairly basic server/client setup, where both are located on the same network. They communicate via Winsock2 blocking sockets over TCP/IP, and are doing so perfectly fine.
However, for the scenario described below, the client sometimes sees an abortive connection termination (RST). It goes right roughly 99 out of 100 times, but that last time annoyingly fails some tests and therefore, my whole build. It is completely unpredictable when and where it happens, and so reproducing the problem has so far eluded me.
If I understand the the relevant MSDN page correctly, the nominal connection termination sequence for blocking sockets can be summarized as:
Client | Server
-----------------------------
shutdown(SD_SEND) |
| send() response data
i=recv() until i==0 | shutdown(SD_SEND)
closesocket() | closesocket()
In my setup it is necessary to
do a relatively expensive operation (let's call it expensive_operation()) depending on whether a portion of the received data (let's say, 512 bytes) contains a trigger value. The server is single-threaded, so expensive_operation() effectively stops recv()ing the data stream until expensive_operation() is complete
initiate a server shutdown sequence if the client sends a particular sentinel value, let's call it 0xDEADBEEF.
My client is implemented such that the sentinel value is always sent last, so after sending it, no other data is sent:
send( "data data data 0xDEADBEEF" ) to server
shutdown(SD_SEND) <------- FAILURE OCCURS HERE
recv() until 0 bytes received
closesocket()
Whenever the server receives 0xDEADBEEF, it confirms the shutdown request and continues termination:
recv() 512 bytes of data or until 0 bytes are returned
Check for trigger. If a trigger is found, perform expensive_operation() and go back to step 1, otherwise continue
Check for sentinel value. If sentinel is not found, go back to step 1.
If the sentinel is found:
send( confirmation ) to client
shutdown(SD_SEND)
closesocket()
all the normal server shutdown stuff
I can understand that if the client intends to send more data after the sentinel, this will result in abortive connection termination -- because the server actively terminates the connection. This is completely expected and by design, and I can indeed reliably reproduce this behavior.
However, in the nominal case, the sentinel is always last in the sequence, which indeed always happens as evidenced by the relevant log entries, and indeed graceful connection termination happens as expected most of the time. But not always...
As I said, it happens randomly and sporadically, so I can't produce a code snippet that reliably reproduces the problem. The only thing that's consistent is that the failure always occurs when calling shutdown() in the client...
I suspect it's more of a design flaw, or some synchronization issue I'm not handling yet, rather than a problem with the code (although I'd be happy to provide the relevant code snippets).
So is there anything obvious I'm overlooking here?
There are several ways you can provoke an RST to be sent apart from deliberately doing so at the sending end by means which I will not reveal here:
Write to a connection that had already been closed by the peer. After a few attempts this will cause ECONNRESET.
Close a connection without having read all the already-pending data. This will cause an immediate ECONNRESET.
Both of these indicate an application protocol error.
In your case I would get rid of the sentinel. It is redundant. Just shutdown the socket for output, or just close it if you know there is no more data coming in. That sends an entirely unambiguous indication to the peer that there is no more data, without any requirement that the peer be precisely sycnchronized byte-for-byte with the local application, which is a weakness and probable source of this bug in your current code.
You need to post some code to get any more concrete assistance.
I cannot reproduce, but I can imagine a use case where client sees abortive termination
client server
send sentinel
send confirmation
shutdown
close socket
shutdown => error writing on closedsocket !
if client process is preempted just after sending its sentinel, and if server is quick, you can fall in that scenario. This is caused by the fact that server side you immediately close the socket after shutdown without being sure client has done its shutdown. IMHO you should do
send( confirmation ) to client
shutdown(SD_SEND)
read until 0 or error
closesocket()
all the normal server shutdown stuff
The order is then deterministic for the upper part :
client server
send sentinel
send confirmation
shutdown shutdown
recv 0
close socket => cannot happen before client shutdown
recv 0 => socket may be closed server side but it would be harmless
I want to create a non-blocking connect.
Like this:
socket.connect(); // returns immediately
For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):
// in another thread
{
create_non_block_socket();
connect();
epoll_create();
epoll_ctl(); // subscribe socket to all events
while (true)
{
epoll_wait(); // wait a small time(~100 ms)
check_socket(); // check on EPOLLOUT event
}
}
If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.
What am I doing wrong? Maybe it can be done differently?
You should use the following steps for an async connect:
create socket with socket(..., SOCK_NONBLOCK, ...)
start connection with connect(fd, ...)
if return value is neither 0 nor EINPROGRESS, then abort with error
wait until fd is signalled as ready for output
check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
done
No loops - unless you want to handle EINTR.
If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.
It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.
There are a few ways to test if a nonblocking connect succeeds.
call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.
Ref: UNIX Network Programming V1
D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.
For those who just want the tl;dr version, the most portable way is the following:
Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.
I'm changing a socket connection in a script to a non-blocking connection. In a tutorial I found the lines:
x=fcntl(s,F_GETFL,0); // Get socket flags
fcntl(s,F_SETFL,x | O_NONBLOCK); // Add non-blocking flag
So I added them after I create my socket and before the connect statement. And it's no longer blocking :) but it also doesn't connect. I'm not getting any errors, the connect is just returning -1. If I comment these lines out it connects.
What else do I need to add to get a non-blocking connection to connect?
Check return value of connect(2) - you should be getting -1, and EINPROGRESS in errno(3). Then add socket file descriptor to a poll set, and wait on it with select(2) or poll(2).
This way you can have multiple connection attempts going on at the same time (that's how e.g. browsers do it) and be able to have tighter timeouts.
connect will probably immediately return a EINPROGRESS error. Read up on use of select.
Note that you'll probably want to wrap your call to select in the TEMP_FAILURE_RETRY macro.
In the client, I have a
close(sockfd)
where sockfd is the socket that's connected to the server.
In the server I've got this:
if (sockfd.revents & POLLERR ||
desc_set[i].revents & POLLHUP || desc_set[i].revents & POLLNVAL) {
close(sockfd.fd);
printf("Goodbye (connection closed)\n");
}
Where sockfd is a struct pollfd, and sockfd.fd is the file descriptor of the client's socket.
When the client closes the socket like I put up there, the server doesn't seem to detect it with the second code (desc_set[i].revents & POLLHUP, etc.).
Does anyone know what's the problem?
Sounds like you've managed to half close the connection from the client side. In this state the connection can still send data in one direction, i.e. it operates in half-duplex mode. This is by design and would allow your server to finish replying to whatever the client sent. Typically this would mean completing a file transfer and calling close(), or answering all of the aspects of the query. In the half-closed state you can still quite sensibly send data to the side that has already called close(). In your server you will see eof if you try to read though. close() just means "I'm done sending, finish up whatever I asked for".
POLLHUP, POLLERR and POLLNVAL only checks the output side of the local connection, which is still valid here. There's a POLLRDHUP, which is a GNU extension that should detect the other side closing, but the tests you're doing are only checking if it's still writable, not if it's still readable.
See also this question, which is talking about java, but still very related.
A remote close or output shutdown is neither an error nor a hangup nor an invalid state. It is a read event such that read() will return zero. Just handle it as part of your normal read processing.
BTW your test condition above should read sockfd.revents & (POLLERR|POLLHUP|POLLNVAL).
In my client code, I am following these steps to connect to a socket:
Creating a socket
sockDesc = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)
Connecting it (retry for 'x' time in case of failure)
connect(sockDesc, (sockaddr *) &destAddr, sizeof(destAddr))
(After filling the destAddr fields)
Using the socket for send()/recv() operation:
send(sockDesc, buffer, bufferLen, 0)
recv(sockDesc, buffer, bufferLen, 0)
close() the socket descriptor and exit
close(sockDesc)
If during send()/recv() the connection breaks, I found that I could connect by returning to step 2.
Is this solution okay? should I close the socket descriptor and return to step 1?
Another interesting observation that I am not able to understand is when
I stop my echo server and start the client. I create a Socket (step 1) and call connect() which fails (as expected) but then I keep calling connect(), lets say, 10 times. After 5 retries I start the server and connect() is successful. But during the send() call it receives SIGPIPE error. I would like to know:
1) Do I need to create a new socket every time connect() fails? As per my understanding as long as I have not performed any send()/recv() on the socket it is as good as new and I can reuse the same fd for the connect() call.
2) I don't understand why SIGPIPE is received when the server is up and connect() is successful.
Yes, you should close and go back to step 1:
close() closes a file descriptor,
so that it no longer refers to any
file and may be reused.
From here.
I think closing the socket is the right thing to do, despite the fact that it may work if you don't.
A socket which has failed to connect may not be in EXACTLY the same state as a brand new one - which could cause problems later. I'd rather avoid the possibility and just make a new one. It's cleaner.
TCP sockets hold a LOT of state, some of which is implementation-specific and worked out from the network.
Sockets corresponding to broken connection is in unstable state. normally you will not be allowed to connect to again unless the operating system release the socket.
I think it will be better to close() and connect again.. you don't have to create another socket.
Anyway, make sure to set LINGER of your socket to ensure that no data is lost in transmision.
See http://www.gnu.org/s/libc/manual/html_node/Socket_002dLevel-Options.html#Socket_002dLevel-Options
If the connection was broken and you try to write on the file descriptor you should get the broken pipe error/signal. All this is saying is that the file descriptor you tried writing to no longer has anyone on the other side to read what you are sending.
What you can do is catch the signal SIGPIPE and then deal with the reconnecting by closing the FD and going back to your step 1. You will now have a new FD you can read and write from for the connection.
If the Single UNIX Specification doesn't say that it MUST work to go back to step #2 instead of step #1, then the fact that it happens to work on Linux is just an implementation detail, and you would be far better off and more portable if you go back to step #1. As far as I am aware, the specification does not make any guarantee that it is ok to go back to step #2 and, therefore, I would advise you to go back to step #1.