I'm implementing a little HTTP client using OpenSSL, and I'm trying to handle "connection timed out" errors gracefully. By gracefully, I mean I want to print a nice, human-readable message that says "Connection Timed Out." Unfortunately, the error handling in OpenSSL isn't making sense to me. Here's what happens:
I create a nonblocking socket and deliberately connect to a port that I know won't respond, in order to test the error handling of "connection timed out." I make the socket into a nonblocking SSL channel.
Then, I call SSL_connect. It returns -1 as expected. I call SSL_get_error to get more information about the error. It returns SSL_ERROR_WANT_WRITE, as expected: it's waiting for connection to timeout, and that takes a while. So far so good.
I keep calling SSL_connect until finally, SSL_get_error returns SSL_ERROR_SYSCALL. Again, this is what I expect. I am expecting the connect system call to fail. So far so good.
Finally, and this is the part that isn't working for me, I try to get the actual error message. Here is the code I'm using:
unsigned long code =
ERR_get_error_line_data(&file, &line, &data, &flags);
To my surprise, this is returning zero, meaning there are no more errors in the error queue. I wasn't expecting that. What I was expecting was an error code with the property ERR_GET_REASON(code) == ETIMEDOUT, so that I could then pass the ETIMEDOUT to strerror, to get the actual error message. It seems weird to me that there's nothing at all in the error queue. I don't understand that.
Related
So, I have a client that attempts to connect with a server. The ip and port are retrieved from a configuration file. I need the program to fail smoothly if something in the config file is incorrect. I connect to the server using the following code
if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
perror("client: connect");
close(sockfd);
continue;
}
If the user attempts to connect to a server on the subnet that is not accepting connections (i.e. is not present), then the program fails with No route to host. If the program attempts to connect to a server that is not on the subnet (i.e. the configuration is bad), then the program hangs at the connect() call. What am I doing incorrectly? I need this to provide some feedback to the user that the application has failed.
You're not doing anything wrong. TCP is designed for reliability in the face of network problems, so if it doesn't get a response to its initial connection request, it retries several times in case the request or response were lost in the network. The default parameters on Linux result in it taking about a minute to give up. Then it will report a failure with the Connection timed out error.
If you want to detect the failure more quickly, see C: socket connection timeout
Normally we don't use continue inside an if statement, unless the if statement is inside a loop, that you are not showing. Assuming there is an outer loop, this would be responsible for what happens next .. either keeps re-entering the if block ( to try to connect again) or skipping past it.
Note also you are closing sockfd inside the if block so if your loop is re-entering the if block to do retries, then it needs to create a new socket first.
I suggest reading some sample code for client and server side socket connections to get a better feel for how it works http://www.cs.rpi.edu/~moorthy/Courses/os98/Pgms/socket.html
If all fails, please provide the code around the if block and also state how you want to "fail smoothly". One way to fail "abruptly' would be to swap the continue statement with a call to exit() "-)
EDIT: After reading Barmar's answer and his comment you also need to be aware of this:
If the initiating socket is connection-mode, then connect() shall
attempt to establish a connection to the address specified by the
address argument. If the connection cannot be established immediately
and O_NONBLOCK is not set for the file descriptor for the socket,
connect() shall block for up to an unspecified timeout interval until
the connection is established. If the timeout interval expires before
the connection is established, connect() shall fail and the connection
attempt shall be aborted.
also..
If the connection cannot be established immediately and O_NONBLOCK is
set for the file descriptor for the socket, connect() shall fail and
set errno to [EINPROGRESS], but the connection request shall not be
aborted, and the connection shall be established asynchronously.
Subsequent calls to connect() for the same socket, before the
connection is established, shall fail and set errno to [EALREADY]
When you say "the program hangs" did you mean forever or for a period that might be explained by a TCP/IP timeout.
If this and Barmar's answer are still not enough, then it would help to see the surrounding code as suggested and determine if blocked or non-blocked etc.
I'm working with a fairly basic server/client setup, where both are located on the same network. They communicate via Winsock2 blocking sockets over TCP/IP, and are doing so perfectly fine.
However, for the scenario described below, the client sometimes sees an abortive connection termination (RST). It goes right roughly 99 out of 100 times, but that last time annoyingly fails some tests and therefore, my whole build. It is completely unpredictable when and where it happens, and so reproducing the problem has so far eluded me.
If I understand the the relevant MSDN page correctly, the nominal connection termination sequence for blocking sockets can be summarized as:
Client | Server
-----------------------------
shutdown(SD_SEND) |
| send() response data
i=recv() until i==0 | shutdown(SD_SEND)
closesocket() | closesocket()
In my setup it is necessary to
do a relatively expensive operation (let's call it expensive_operation()) depending on whether a portion of the received data (let's say, 512 bytes) contains a trigger value. The server is single-threaded, so expensive_operation() effectively stops recv()ing the data stream until expensive_operation() is complete
initiate a server shutdown sequence if the client sends a particular sentinel value, let's call it 0xDEADBEEF.
My client is implemented such that the sentinel value is always sent last, so after sending it, no other data is sent:
send( "data data data 0xDEADBEEF" ) to server
shutdown(SD_SEND) <------- FAILURE OCCURS HERE
recv() until 0 bytes received
closesocket()
Whenever the server receives 0xDEADBEEF, it confirms the shutdown request and continues termination:
recv() 512 bytes of data or until 0 bytes are returned
Check for trigger. If a trigger is found, perform expensive_operation() and go back to step 1, otherwise continue
Check for sentinel value. If sentinel is not found, go back to step 1.
If the sentinel is found:
send( confirmation ) to client
shutdown(SD_SEND)
closesocket()
all the normal server shutdown stuff
I can understand that if the client intends to send more data after the sentinel, this will result in abortive connection termination -- because the server actively terminates the connection. This is completely expected and by design, and I can indeed reliably reproduce this behavior.
However, in the nominal case, the sentinel is always last in the sequence, which indeed always happens as evidenced by the relevant log entries, and indeed graceful connection termination happens as expected most of the time. But not always...
As I said, it happens randomly and sporadically, so I can't produce a code snippet that reliably reproduces the problem. The only thing that's consistent is that the failure always occurs when calling shutdown() in the client...
I suspect it's more of a design flaw, or some synchronization issue I'm not handling yet, rather than a problem with the code (although I'd be happy to provide the relevant code snippets).
So is there anything obvious I'm overlooking here?
There are several ways you can provoke an RST to be sent apart from deliberately doing so at the sending end by means which I will not reveal here:
Write to a connection that had already been closed by the peer. After a few attempts this will cause ECONNRESET.
Close a connection without having read all the already-pending data. This will cause an immediate ECONNRESET.
Both of these indicate an application protocol error.
In your case I would get rid of the sentinel. It is redundant. Just shutdown the socket for output, or just close it if you know there is no more data coming in. That sends an entirely unambiguous indication to the peer that there is no more data, without any requirement that the peer be precisely sycnchronized byte-for-byte with the local application, which is a weakness and probable source of this bug in your current code.
You need to post some code to get any more concrete assistance.
I cannot reproduce, but I can imagine a use case where client sees abortive termination
client server
send sentinel
send confirmation
shutdown
close socket
shutdown => error writing on closedsocket !
if client process is preempted just after sending its sentinel, and if server is quick, you can fall in that scenario. This is caused by the fact that server side you immediately close the socket after shutdown without being sure client has done its shutdown. IMHO you should do
send( confirmation ) to client
shutdown(SD_SEND)
read until 0 or error
closesocket()
all the normal server shutdown stuff
The order is then deterministic for the upper part :
client server
send sentinel
send confirmation
shutdown shutdown
recv 0
close socket => cannot happen before client shutdown
recv 0 => socket may be closed server side but it would be harmless
I want to create a non-blocking connect.
Like this:
socket.connect(); // returns immediately
For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):
// in another thread
{
create_non_block_socket();
connect();
epoll_create();
epoll_ctl(); // subscribe socket to all events
while (true)
{
epoll_wait(); // wait a small time(~100 ms)
check_socket(); // check on EPOLLOUT event
}
}
If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.
What am I doing wrong? Maybe it can be done differently?
You should use the following steps for an async connect:
create socket with socket(..., SOCK_NONBLOCK, ...)
start connection with connect(fd, ...)
if return value is neither 0 nor EINPROGRESS, then abort with error
wait until fd is signalled as ready for output
check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
done
No loops - unless you want to handle EINTR.
If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.
It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.
There are a few ways to test if a nonblocking connect succeeds.
call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.
Ref: UNIX Network Programming V1
D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.
For those who just want the tl;dr version, the most portable way is the following:
Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.
I write network application which communicates via Linux TCP socket. Recently I've noticed send system call crashing my application. It works fine when both peers are up (I'm testing crash recovery now). But when one peer is down second crashes executing this piece of code.
fprintf(stderr, "out_tcp %d\n", out_tcp);
if(send(out_tcp, &packet, sizeof(packet), 0) == -1)
fprintf(stderr, "send TCP error");
fprintf(stderr, "after send");
Socket is already prepared and connected and was executed several times before second peer went down. I've expected this code returning -1 value, but it produces on output only "out_tcp 11" then application exits. No error message, no returned value from send. I run it under Valgrind, it says application exits normally - no error/warning message.
Does anyone has any idea how to debug it? Tools to use? I'm pretty stuck with this as I get no informations.
Thanks in advance
Harnen
Looks like your application is ignoring SIGPIPE. Please see this thread for further information:
How to prevent SIGPIPEs (or handle them properly)
SOLVED:
USE MSG_EOR,MSG_NOSIGNALflag in send function as below
if(send(out_tcp, &packet, sizeof(packet), **MSG_EOR|MSG_NOSIGNAL**) == -1)
Hope it helps
Have you tried to RTFM (read the fine manual) about error conditions? Do you catch or ignore any signals? What about errno global variable?
man send
And also TCP is a streaming protocol, therefore it is recommended to use usual streaming access commands like read(), write() if you do not need any special flags.
I am using a server that is crashing following a call to recv() returning -1 and errno set to ECONNRESET. I originally found this condition using nmap (I'm not a cracker, was just testing if the port was open at the time.) However, nmap uses raw sockets so I'm not too happy submitting this as a test case to the developers. I would rather write a client program in C that can cause the ECONNRESET.
So far I have tried two things: connect() to the server from my client and then shutdown() the socket immediately after connecting. recv() on the server still returned 1 (I have inserted debugging code so I can see the return value.) I also tried calling send() with some string and then immediately calling shutdown(). No dice, the string was transmitted fine.
So how would I cause this condition? Non portable is fine, I am using Linux.
The problem is that you are calling shutdown. Call close instead.
Take a look at a TCP state diagram.
http://tangentsoft.net/wskfaq/articles/debugging-tcp.html
Basically, shutdown closes a socket "politely" by sending a FIN and waiting for the peer to finish (FIN -> ACK/FIN -> ACK -> closed), at which point you call close and all is good. If you call close without calling shutdown first, it's the "impolite" version which sends a RST -- the equivalent of hanging up in the middle of a phone call, without waiting for the other person to finish what they're saying.
Think of "shutdown" as "say goodbye", and "close" as "hang up". You always have to hang up, but you don't have to say goodbye first.
About nmap: It is perfectly acceptable to give developers a test case with nmap. That's one of the main purposes of nmap anyway.
Your instincts were correct to use shutdown(), however you were not using it correctly for this.
Presumably you are trying shutdown() with SHUT_WR or SHUT_RDWR. When you close the writing direction, as these do, your side of the connection notifies the peer with a FIN - indicating that no more data will be forthcoming from your side. This will cause recv() on the other side to indicate a clean end-of-file on the connection, which isn't what you want in this case.
Instead, you want to use SHUT_RD to shutdown the reading direction of the socket only, and hold it open for writing. This will not notify the peer immediately - but if the peer sends any data after this point, your side will respond with a RST, to inform the peer that some data was lost - it wasn't seen by your client application.
(So, to ensure that you get a connection reset, you need to make sure that the server will be trying to send something to you - you might need to send something first, then perform the reading shutdown).