Timeout implementation in C for TFTP - c

I am trying to implement the the timeout mechanism in my c implementation of TFTP, and i am looking for some general help.
What I am wondering is how to manage the timeout situation. The premature timeout mechanism that I used is with signal/alarm functions, but somehow I am stuck in how to handle my timeouts, that is if the packet (ack or data) is missed and a timeout occurs how to send back the previous packet or ack to the server.

Avoid signal and alarm if possible.
Either use SO_RCVTIMEO socket option or just use select with a timeout of T seconds.
If the select() call returns and your socket is not in the read set, or if recvfrom returns with a timeout error, then you can take appropriately action in your code.
Example of timeout usage:
timeval tv = {0,0};
tv.tv_sec = 5;
socklen_t optionlength = sizeof(tv);
setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &tv, optionlength);
while (1)
{
result = recvfrom(s, buffer, bufferlength, 0);
if (result == -1) && ((errno == EAGAIN) || (errno == EWOULDBLOCK)) )
{
// handle timeout
}
else if (result == -1)
{
// handle critical error
}
else
{
// process next packet
}
}
Example of select usage:
while (1)
{
timeval tv = {0,0};
tv.tv_sec = 5;
fd_set readset = {};
FD_ZERO(&readset);
FD_SET(s, &readset);
select(s+1, &readset, NULL, NULL, &tv);
if (FD_ISSET(s, &readset))
{
result = recvfrom(s, buffer, bufferlength, 0);
if (result == -1)
{
// handle error
}
else
{
// process packet
}
}
else
{
// handle timeout
}
}

Related

Linux socket select populates read/write fd sets but socket refuses connection

I am trying to have a client socket make a connection to a server with a timeout.
In order to achieve the timeout, I am using a select call with the the ts set to 30s:
int flags = 0, error = 0, ret = 0;
fd_set rset, wset;
socklen_t len = sizeof(error);
struct timeval ts;
ts.tv_sec = 0;
ts.tv_usec = mConnectTimeoutMs * 1000; // this is 30s
// clear out descriptor sets for select
// add socket to the descriptor sets
FD_ZERO(&rset);
FD_SET(sock, &rset);
wset = rset;
// set socket nonblocking flag
if ((flags = fcntl(sock, F_GETFL, 0)) < 0) {
return -1;
}
if (fcntl(sock, F_SETFL, flags | O_NONBLOCK) < 0) {
return -1;
}
// initiate non-blocking connect
if ((ret = ::connect(sock, sa, size)) < 0)
if (errno != EINPROGRESS) {
return -1;
}
if (ret == 0) // then connect succeeded right away
{
// put socket back in blocking mode
if (fcntl(sock, F_SETFL, flags) < 0) {
return -1;
}
return 0;
}
// we are waiting for connect to complete now
if ((ret = select(sock + 1, &rset, &wset, NULL, &ts)) < 0) {
return -1;
}
if (ret == 0) { // we had a timeout
errno = ETIMEDOUT;
return -1;
}
// we had a positive return so a descriptor is ready
if (FD_ISSET(sock, &rset) || FD_ISSET(sock, &wset)) {
if (getsockopt(sock, SOL_SOCKET, SO_ERROR, &error, &len) < 0) {
return -1;
}
} else {
return -1;
}
if (error) { // check if we had a socket error
errno = error; // this always returns 111
return -1;
}
The point of the timeout is to allow time for the server to spawn & the server socket to be listening/accepting.
For some reason, without the server running, the select call falls through immediatly, with the rset and wset both returning true from FD_ISSET(sock.
Calling:
getsockopt(sock, SOL_SOCKET, SO_ERROR, &error, &len)
Always results in the error being populated with error code 111 (connection refused), which is expected, since the server is not running yet. What am i doing wrong here If I want the select to wait for the socket to be ready to actually connect? Or how can I correctly "wait for the server socket to exist to connect" using a timeout?
Per #Barmar's comments, the select falls through as a result of the RST when the server socket is not yet listening, and the resulting socket will have an error (ECONNREFUSED). To achieve the timeout as intended, we can wrap the existing logic in a do/while loop, and then modify the timeout value to by dynamic based on remaining time in the timeout:
#include <chrono>
#include <thread>
...
int timeoutRemaining = mConnectTimeoutMs;
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
do {
// same conn logic as before, except:
...
ts.tv_usec = timeoutRemaining * 1000;
...
if (error) { // check if we had a socket error
errno = error;
if (errno == ECONNREFUSED) {
close(sock); // can't call connect on a socket thats refused connection
sock = create_new_sock();
// artificially throttle connection requests
std::this_thread::sleep_for(std::chrono::seconds(1));
continue; // there is no server available, continue trying until we reach our connection timeout
}
return -1;
}
...
} while ((timeoutRemaining = (mConnectTimeoutMs
- (std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::steady_clock::now() - start)
.count())))
> 0);

close() does not close tcp connection if interface lost its ip address

My program has an established tcp connection when linux box loses its DHCP IP address lease. After that it tries to close the connection so when dhcp server recovers it will re-establish tcp connection again. It uses SO_REUSEADDR.
I did read this http://hea-www.harvard.edu/~fine/Tech/addrinuse.html but in this application reuse address is a requirement.
The way I reproduce this problem is by issuing ifconfig etho 0.0.0.0
However, the result of close(sockfd) is unpredictable. Sometimes it closes socket properly. Sometimes netstat -ant continuously shows
tcp 0 0 192.168.1.119:54322 192.168.1.41:54321 (STATE)
where (STATE) can one of ESTABLISHED, or FIN_WAIT1, or CLOSE_WAIT.
Originally my code had just close(). After reading multiple sources online, I tried some suggestions.
First I tried this (based on http://deepix.github.io/2016/10/21/tcprst.html)
if (sockFd != -1) {
linger lin;
lin.l_onoff = 1;
lin.l_linger = 0;
if (setsockopt(sockFd, SOL_SOCKET, SO_LINGER, (const char *)&lin, sizeof(linger)) == -1) {
std::cout << "Error setting socket opt SO_LINGER while trying to close " << std::endl;
}
close(sockFd);
}
It did not help, so I tried this (based on close() is not closing socket properly )
bool haveInput(int fd, double timeout) {
int status;
fd_set fds;
struct timeval tv;
FD_ZERO(&fds);
FD_SET(fd, &fds);
tv.tv_sec = (long)timeout; // cast needed for C++
tv.tv_usec = (long)((timeout - tv.tv_sec) * 1000000); // 'suseconds_t'
while (1) {
if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
return FALSE;
else if (status > 0 && FD_ISSET(fd, &fds))
return TRUE;
else if (status > 0)
break;
else if (errno != EINTR)
break;
}
}
void myClose(int sockFd)
{
if (sockFd != -1) {
int err = 1;
socklen_t len = sizeof err;
getsockopt(sockFd, SOL_SOCKET, SO_ERROR, (char *)&err, &len);
shutdown(sockFd, SHUT_WR);
usleep(20000);
char discard[99];
while (haveInput(sockFd, 0.01))
if (!read(sockFd, discard, sizeof discard))
break;
shutdown(sockFd, SHUT_RD);
usleep(20000);
close(sockFd);
sockFd = -1;
}
}
As before, sometimes it closes connection, and sometimes it does not.
I understand that in this case the other side can send neither FIN nor ACK, so graceful close is just not possible.
Is there a reliable way to completely close tcp connection in such conditions?
Thank you

setting timeout for socket recv TCP

I found some hints on how to do it but I dont understand how to use setsockopt. I have an infinite while loop calling recv, I want to timeout and close(cli_socket) if client doesnt send anything in 5 seconds.
If client sends only part of the whole expected message I want to reset timer and give him another 5 seconds.
currentry I have this:
while((cut = buffer.find("\r\n")) == -1)
{
struct timeval tv;
tv.tv_sec = 5;
setsockopt(cli_socket, SOL_SOCKET, SO_RCVTIMEO,(struct timeval *)&tv,sizeof(struct timeval));
recv(cli_socket, tmpBuffer, 100, 0);
buffer += tmpBuffer;
memset(tmpBuffer, 0, 100);
}
You should test the return of recv and break your loop if it is EAGAIN or EWOULDBLOCK:
EAGAIN or EWOULDBLOCK
The socket is marked nonblocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data was received
struct timeval tv = {5, 0};
setsockopt(cli_socket, SOL_SOCKET, SO_RCVTIMEO, (struct timeval *)&tv, sizeof(struct timeval));
while((cut = buffer.find("\r\n")) == -1)
{
int numBytes = recv(cli_socket, tmpBuffer, 100, 0));
/// Edit: recv does not return EAGAIN else, it return -1 on error.
/// and in case of timeout, errno is set to EAGAIN or EWOULDBLOCK
if (numBytes <= 0)
{
// nothing received from client in last 5 seconds
break;
}
buffer.append(tmpBuffer, numBytes);
}
You can also use select function which is not so complicated to use:
while((cut = buffer.find("\r\n")) == -1)
{
timeval timeout = { 5, 0 };
fd_set in_set;
FD_ZERO(&in_set);
FD_SET(cli_socket, &in_set);
// select the set
int cnt = select(cli_socket + 1, &in_set, NULL, NULL, &timeout);
if (FD_ISSET(cli_socket, &in_set))
{
int numBytes = recv(cli_socket, tmpBuffer, 100, 0));
if (numBytes <= 0)
{
// nothing received from client
break;
}
buffer.append(tmpBuffer, numBytes);
}
else
{
// nothing received from client in last 5 seconds
break;
}
}

Can't seem to get a timeout working when connecting to a socket

I'm trying to supply a timeout for connect(). I've searched around and found several articles related to this. I've coded up what I believe should work but unfortunately I get no error reported from getsockopt(). But then when I come to the write() it fails with an errno of 107 - ENOTCONN.
A couple of points. I'm running on Fedora 23. The docs for connect() says it should return failure with an errno of EINPROGRESS for a connect that is not complete yet however I was experiencing EAGAIN so I added that to my check. Currently my socket server is setting the backlog to zero in the listen() call. Many of the calls succeed but the ones that fail all fail with the 107 - ENOTCONN I had mentioned in the write() call.
I'm hoping I'm just missing something but so far can't figure out what.
int domain_socket_send(const char* socket_name, unsigned char* buffer,
unsigned int length, unsigned int timeout)
{
struct sockaddr_un addr;
int fd = -1;
int result = 0;
// Create socket.
fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd == -1)
{
result = -1;
goto done;
}
if (timeout != 0)
{
// Enabled non-blocking.
int flags;
flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
// Set socket name.
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);
// Connect.
result = connect(fd, (struct sockaddr*) &addr, sizeof(addr));
if (result == -1)
{
// If some error then we're done.
if ((errno != EINPROGRESS) && (errno != EAGAIN))
goto done;
fd_set write_set;
struct timeval tv;
// Set timeout.
tv.tv_sec = timeout / 1000000;
tv.tv_usec = timeout % 1000000;
unsigned int iterations = 0;
while (1)
{
FD_ZERO(&write_set);
FD_SET(fd, &write_set);
result = select(fd + 1, NULL, &write_set, NULL, &tv);
if (result == -1)
goto done;
else if (result == 0)
{
result = -1;
errno = ETIMEDOUT;
goto done;
}
else
{
if (FD_ISSET(fd, &write_set))
{
socklen_t len;
int socket_error;
len = sizeof(socket_error);
// Get the result of the connect() call.
result = getsockopt(fd, SOL_SOCKET, SO_ERROR,
&socket_error, &len);
if (result == -1)
goto done;
// I think SO_ERROR will be zero for a successful
// result and errno otherwise.
if (socket_error != 0)
{
result = -1;
errno = socket_error;
goto done;
}
// Now that the socket is writable issue another connect.
result = connect(fd, (struct sockaddr*) &addr,
sizeof(addr));
if (result == 0)
{
if (iterations > 1)
{
printf("connect() succeeded on iteration %d\n",
iterations);
}
break;
}
else
{
if ((errno != EAGAIN) && (errno != EINPROGRESS))
{
int err = errno;
printf("second connect() failed, errno = %d\n",
errno);
errno = err;
goto done;
}
iterations++;
}
}
}
}
}
// If we put the socket in non-blocking mode then put it back
// to blocking mode.
if (timeout != 0)
{
// Turn off non-blocking.
int flags;
flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
}
// Write buffer.
result = write(fd, buffer, length);
if (result == -1)
{
int err = errno;
printf("write() failed, errno = %d\n", err);
errno = err;
goto done;
}
done:
if (result == -1)
result = errno;
else
result = 0;
if (fd != -1)
{
shutdown(fd, SHUT_RDWR);
close(fd);
}
return result;
}
UPDATE 04/05/2016:
It dawned on me that maybe I need to call connect() multiple times until successful, after all this is non-blocking io not async io. Just like I have to call read() again when there is data to read after encountering an EAGAIN on a read(). In addition, I found the following SO question:
Using select() for non-blocking sockets to connect always returns 1
in which EJP's answer says you need to issue multiple connect()'s. Also, from the book EJP references:
https://books.google.com/books?id=6H9AxyFd0v0C&pg=PT681&lpg=PT681&dq=stevens+and+wright+tcp/ip+illustrated+non-blocking+connect&source=bl&ots=b6kQar6SdM&sig=kt5xZubPZ2atVxs2VQU4mu7NGUI&hl=en&sa=X&ved=0ahUKEwjmp87rlfbLAhUN1mMKHeBxBi8Q6AEIIzAB#v=onepage&q=stevens%20and%20wright%20tcp%2Fip%20illustrated%20non-blocking%20connect&f=false
it seems to indicate you need to issue multiple connect()'s. I've modified the code snippet in this question to call connect() until it succeeds. I probably still need to make changes around possibly updating the timeout value passed to select(), but that's not my immediate question.
Calling connect() multiple times appears to have fixed my original problem, which was that I was getting ENOTCONN when calling write(), I guess because the socket was not connected. However, you can see from the code that I'm tracking how many times through the select loop until connect() succeeds. I've seen the number go into the thousands. This gets me worried that I'm in a busy wait loop. Why is the socket writable even though it's not in a state that connect() will succeed? Is calling connect() clearing that writable state and it's getting set again by the OS for some reason, or am I really in a busy wait loop?
Thanks,
Nick
From http://lxr.free-electrons.com/source/net/unix/af_unix.c:
441 static int unix_writable(const struct sock *sk)
442 {
443 return sk->sk_state != TCP_LISTEN &&
444 (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
445 }
I'm not sure what these buffers are that are being compared, but it looks obvious that the connected state of the socket is not being checked. So unless these buffers are modified when the socket becomes connected it would appear my unix socket will always be marked as writable and thus I can't use select() to determine when the non-blocking connect() has finished.
and based on this snippet from http://lxr.free-electrons.com/source/net/unix/af_unix.c:
1206 static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
1207 int addr_len, int flags)
.
.
.
1230 timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
.
.
.
1271 if (unix_recvq_full(other)) {
1272 err = -EAGAIN;
1273 if (!timeo)
1274 goto out_unlock;
1275
1276 timeo = unix_wait_for_peer(other, timeo);
.
.
.
it appears setting the send timeout might be capable of timing out the connect. Which also matches the documentation for SO_SNDTIMEO at http://man7.org/linux/man-pages/man7/socket.7.html.
Thanks,
Nick
Your error handling on select() could use some cleanup. You don't really need to query SO_ERROR unless except_set is set. If select() returns > 0 then either write_set and/or except_set is set, and if except_set is not set then the connection was successful.
Try something more like this instead:
int domain_socket_send(const char* socket_name, unsigned char* buffer,
unsigned int length, unsigned int timeout)
{
struct sockaddr_un addr;
int fd;
int result;
// Create socket.
fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd == -1)
return errno;
if (timeout != 0)
{
// Enabled non-blocking.
int flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
// Set socket name.
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);
// Connect.
result = connect(fd, (struct sockaddr*) &addr, sizeof(addr));
if (result == -1)
{
// If some error then we're done.
if ((errno != EINPROGRESS) && (errno != EAGAIN))
goto done;
// Now select() to find out when connect() has finished.
fd_set write_set;
fd_set except_set;
FD_ZERO(&write_set);
FD_ZERO(&write_set);
FD_SET(fd, &write_set);
FD_SET(fd, &except_set);
struct timeval tv;
// Set timeout.
tv.tv_sec = timeout / 1000000;
tv.tv_usec = timeout % 1000000;
result = select(fd + 1, NULL, &write_set, &except_set, &tv);
if (result == -1)
{
goto done;
}
else if (result == 0)
{
result = -1;
errno = ETIMEDOUT;
goto done;
}
else if (FD_ISSET(fd, &except_set))
{
int socket_error;
socklen_t len = sizeof(socket_error);
// Get the result of the connect() call.
result = getsockopt(fd, SOL_SOCKET, SO_ERROR, &socket_error, &len);
if (result != -1)
{
result = -1;
errno = socket_error;
}
goto done;
}
else
{
// connected
}
}
// If we put the socket in non-blocking mode then put it back
// to blocking mode.
if (timeout != 0)
{
int flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
}
// Write buffer.
result = write(fd, buffer, length);
done:
if (result == -1)
result = errno;
else
result = 0;
if (fd != -1)
{
shutdown(fd, SHUT_RDWR);
close(fd);
}
return result;
}

Non-blocking connect() and EINTR

I am using connect_nonb() from Stevens, UNIX Network programming:
int
connect_nonb(int sockfd, const SA *saptr, socklen_t salen, int nsec)
{
int flags, n, error;
socklen_t len;
fd_set rset, wset;
struct timeval tval;
flags = Fcntl(sockfd, F_GETFL, 0);
Fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
error = 0;
if ( (n = connect(sockfd, saptr, salen)) < 0)
if (errno != EINPROGRESS)
return(-1);
/* Do whatever we want while the connect is taking place. */
if (n == 0)
goto done; /* connect completed immediately */
FD_ZERO(&rset);
FD_SET(sockfd, &rset);
wset = rset;
tval.tv_sec = nsec;
tval.tv_usec = 0;
if ( (n = Select(sockfd+1, &rset, &wset, NULL,
nsec ? &tval : NULL)) == 0) {
close(sockfd); /* timeout */
errno = ETIMEDOUT;
return(-1);
}
if (FD_ISSET(sockfd, &rset) || FD_ISSET(sockfd, &wset)) {
len = sizeof(error);
if (getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len) < 0)
return(-1); /* Solaris pending error */
} else
err_quit("select error: sockfd not set");
done:
Fcntl(sockfd, F_SETFL, flags); /* restore file status flags */
if (error) {
close(sockfd); /* just in case */
errno = error;
return(-1);
}
return(0);
}
This function allows a custom timeout of connect(). If, whilst blocking in select() waiting for the connect to succeed, a signal is received, select() exits with -1 (EINTR). At this point the select() timeout has not expired, the connect has not succeeded (i.e. the target host could be disconnected) but the subsequent getsockopt() does not return an error.
Should getsockopt() return an error or should the Stevens code check the return code (and errno) of select()?
Currently when connecting to a non-existent host and a signal interrupts select() this function returns success incorrectly.
I'm not sure what Select() is. I assume it's some kind of thin wrapper around select().
In most applications, whenever select() fails with EINTR, you should silently loop and call select() again, possibly after recalculating the timeout to account for the fact that some time has elapsed in the prior call to select().
This case is no exception. select() should be in a loop.

Resources