My program has an established tcp connection when linux box loses its DHCP IP address lease. After that it tries to close the connection so when dhcp server recovers it will re-establish tcp connection again. It uses SO_REUSEADDR.
I did read this http://hea-www.harvard.edu/~fine/Tech/addrinuse.html but in this application reuse address is a requirement.
The way I reproduce this problem is by issuing ifconfig etho 0.0.0.0
However, the result of close(sockfd) is unpredictable. Sometimes it closes socket properly. Sometimes netstat -ant continuously shows
tcp 0 0 192.168.1.119:54322 192.168.1.41:54321 (STATE)
where (STATE) can one of ESTABLISHED, or FIN_WAIT1, or CLOSE_WAIT.
Originally my code had just close(). After reading multiple sources online, I tried some suggestions.
First I tried this (based on http://deepix.github.io/2016/10/21/tcprst.html)
if (sockFd != -1) {
linger lin;
lin.l_onoff = 1;
lin.l_linger = 0;
if (setsockopt(sockFd, SOL_SOCKET, SO_LINGER, (const char *)&lin, sizeof(linger)) == -1) {
std::cout << "Error setting socket opt SO_LINGER while trying to close " << std::endl;
}
close(sockFd);
}
It did not help, so I tried this (based on close() is not closing socket properly )
bool haveInput(int fd, double timeout) {
int status;
fd_set fds;
struct timeval tv;
FD_ZERO(&fds);
FD_SET(fd, &fds);
tv.tv_sec = (long)timeout; // cast needed for C++
tv.tv_usec = (long)((timeout - tv.tv_sec) * 1000000); // 'suseconds_t'
while (1) {
if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
return FALSE;
else if (status > 0 && FD_ISSET(fd, &fds))
return TRUE;
else if (status > 0)
break;
else if (errno != EINTR)
break;
}
}
void myClose(int sockFd)
{
if (sockFd != -1) {
int err = 1;
socklen_t len = sizeof err;
getsockopt(sockFd, SOL_SOCKET, SO_ERROR, (char *)&err, &len);
shutdown(sockFd, SHUT_WR);
usleep(20000);
char discard[99];
while (haveInput(sockFd, 0.01))
if (!read(sockFd, discard, sizeof discard))
break;
shutdown(sockFd, SHUT_RD);
usleep(20000);
close(sockFd);
sockFd = -1;
}
}
As before, sometimes it closes connection, and sometimes it does not.
I understand that in this case the other side can send neither FIN nor ACK, so graceful close is just not possible.
Is there a reliable way to completely close tcp connection in such conditions?
Thank you
Related
I am using non-blocking sockets to connect to a server.
In a specific test scenario, the server is down, which means a TCP SYN goes out, but there is no response and there can never be an established connection.
In this setup, usually select times out after 2 seconds returning 0.
This is the behavior most of the time and it seems correct.
However, in roughly 5% of the cases, select immediately returns 1 (indicating the socket is readable in the mask).
But when I read(2) from the socket, -1 is returned with 'Network is unreachable'
sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// sockfd checked and > 0
// set non-blocking
struct timeval tv{};
tv.tv_sec = 2;
int ret = connect(sockfd, addr, addrlen ); // addr set elsewhere
if (ret < 0 && errno == EINPROGRESS)
{
fd_set cset;
FD_ZERO(&cset);
FD_SET(sockfd, &cset);
ret = select(sockfd + 1, &cset, nullptr, nullptr, &tv);
// returns 1 sometimes
}
In the first post, I incorrectly stated that in the error case, there is only one TCP SYN on the network (without retries).
This is not true; in both the error and non-error case, there is a TCP SYN on the network that is re-sent after 1 second.
What might cause this and is there a way to get consistent behavior with select ?
The correct way to determine if a non-blocking connect() is finished is to ask select() for writability not readability. This is clearly stated in the connect() documentation:
EINPROGRESS
The socket is nonblocking and the connection cannot be completed immediately. (UNIX domain sockets failed with EAGAIN instead.) It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).
It is undefined behavior to use select()/poll() to test a socket for readability before you know the connection has actually been established first.
Try this instead:
sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// sockfd checked and > 0
// set non-blocking
int ret = connect(sockfd, addr, addrlen); // addr set elsewhere
if (ret < 0)
{
if (errno != EINPROGRESS)
{
close(sockfd);
sockfd = -1;
}
else
{
fd_set cset;
FD_ZERO(&cset);
FD_SET(sockfd, &cset);
struct timeval tv{};
tv.tv_sec = 2;
ret = select(sockfd + 1, nullptr, &cset, nullptr, &tv);
if (ret <= 0)
{
close(sockfd);
sockfd = -1;
}
else
{
int errCode = 0;
socklen_t len = sizeof(errCode);
getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &errCode, &len);
if (errCode != 0)
{
close(sockfd);
sockfd = -1;
}
}
}
}
if (sockfd != -1)
{
// use sockfd as needed (read(), etc) ...
close(sockfd);
}
What is the ideal way to write code for non-blocking connect?
I saw a reference from the other thread in stackoverflow(Linux, sockets, non-blocking connect) where it mentions checking status of the socket by getsockopt(fd, SOL_SOCKET, SO_ERROR, ...) in the end and I cannot find any reference for why this is needed?
Also, it mentions about handling ECONNREFUSED. How and why it needs to be handled?
Can anyone comment?
Thanks.
int nonblocking_connect() {
int flags, ret, res;
struct sockaddr_in serv_addr;
int fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd == -1) {
return -1;
}
flags = fcntl(fd, F_GETFL, 0);
if (flags == -1) {
goto end;
}
if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1) {
goto end;
}
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(8080);
serv_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
ret = connect(fd, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
if (ret == -1) {
fd_set wfd, efd;
struct timeval tv;
if (errno != EINPROGRESS) {
goto end;
}
FD_ZERO(&wfd);
FD_SET(fd, &wfd);
FD_ZERO(&efd);
FD_SET(fd, &efd);
// Set 1 second timeout for successfull connect
tv.tv_sec = 1;
res = select(fd + 1, NULL, &wfd, &efd, &tv);
if (res == -1) {
goto end;
}
// timed-out
if (res == 0) {
goto end;
}
if (FD_ISSET(fd, &efd)) {
goto end;
}
}
return fd;
end:
close(fd);
return -1;
}
The code shown in the example is a bit misleading, in that it's not really implementing a non-blocking connect; rather it is implementing a blocking connect with a one-second timeout. (That is, if the code is working as intended, the nonblocking_connect() function might not return for up to one second when it is called).
That's fine, if that's what you want to do, but the real use-case for a non-blocking connect() is when your event-loop needs to make a TCP connection but also wants to be able to do other things while the TCP connection-setup is in progress.
For example, the program below will echo back any text you type in to stdin; however if you type in a command of the form connect 172.217.9.4 it will start a non-blocking TCP connection to port 443 of the IP address you entered. The interesting thing to note is that while the TCP connection is in progress you are still able to enter text into stdin and the program can still respond (it can even abort the TCP-connection-in-progress and start a new one if you tell it to) -- that can be useful, especially when the TCP connection is taking a long time to set up (e.g. because the server is slow, or because there is a firewall between you and the server that is blocking your client's TCP packets, in which case the TCP connection attempt might take several minutes before it times out and fails)
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char ** argv)
{
printf("Type something and press return to have your text echoed back to you\n");
printf("Or type e.g. connect 172.217.9.4 and press return to start a non-blocking TCP connection.\n");
printf("Note that the text-echoing functionality still works, even when the TCP connection setup is still in progress!\n");
int tcpSocket = -1; // this will be set non-negative only when we have a TCP connection in progress
while(1)
{
fd_set readFDs, writeFDs;
FD_ZERO(&readFDs);
FD_ZERO(&writeFDs);
FD_SET(STDIN_FILENO, &readFDs);
if (tcpSocket >= 0) FD_SET(tcpSocket, &writeFDs);
int maxFD = STDIN_FILENO;
if (tcpSocket > maxFD) maxFD = tcpSocket;
if (select(maxFD+1, &readFDs, &writeFDs, NULL, NULL) < 0) {perror("select"); exit(10);}
if (FD_ISSET(STDIN_FILENO, &readFDs))
{
char buf[256] = "\0";
fgets(buf, sizeof(buf), stdin);
if (strncmp(buf, "connect ", 8) == 0)
{
if (tcpSocket >= 0)
{
printf("Closing existing TCP socket %i before starting a new connection attempt\n", tcpSocket);
close(tcpSocket);
tcpSocket = -1;
}
tcpSocket = socket(AF_INET, SOCK_STREAM, 0);
if (tcpSocket < 0) {perror("socket"); exit(10);}
const char * connectDest = &buf[8];
printf("Starting new TCP connection using tcpSocket=%i to: %s\n", tcpSocket, connectDest);
int flags = fcntl(tcpSocket, F_GETFL, 0);
if (flags == -1) {perror("fcntl"); exit(10);}
if (fcntl(tcpSocket, F_SETFL, flags | O_NONBLOCK) == -1) {perror("fcntl"); exit(10);}
struct sockaddr_in serv_addr; memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(443); // https port
if (inet_aton(connectDest, &serv_addr.sin_addr) != 1) printf("Unable to parse IP address %s\n", connectDest);
int ret = connect(tcpSocket, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
if (ret == 0)
{
printf("connect() succeeded immediately! We can just use tcpSocket now\n");
close(tcpSocket); // but for the sake of this demo, I won't
tcpSocket = -1;
}
else if (ret == -1)
{
if (errno == EINPROGRESS)
{
printf("connect() returned -1/EINPROGRESS: the TCP connection attempt is now happening, but in the background.\n");
printf("while that's going on, you can still enter text here.\n");
}
else
{
perror("connect");
exit(10);
}
}
}
else printf("You typed: %s\n", buf);
}
if ((tcpSocket >= 0)&&(FD_ISSET(tcpSocket, &writeFDs)))
{
// Aha, the TCP setup has completed! Now let's see if it succeeded or failed
int setupResult;
socklen_t resultLength = sizeof(setupResult);
if (getsockopt(tcpSocket, SOL_SOCKET, SO_ERROR, &setupResult, &resultLength) < 0) {perror("getsocketopt"); exit(10);}
if (setupResult == 0)
{
printf("\nTCP connection setup complete! The TCP socket can now be used to communicate with the server\n");
}
else
{
printf("\nTCP connection setup failed because [%s]\n", strerror(setupResult));
}
// Close the socket, since for the purposes of this demo we don't need it any longer
// A real program would probably keep it around and select()/send()/recv() on it as appropriate
close(tcpSocket);
tcpSocket = -1;
}
}
}
As for why you would want to call getsockopt(fd, SOL_SOCKET, SO_ERROR, ...), it's simply to determine whether select() returned ready-for-write on the TCP socket because the TCP-connection-setup succeeded, or because it failed (and in the latter case why it failed, if you care about why)
Is using alarm() is the only to set connect() timeout on unix domain socket? I've tried select() which is described here but seems like select() returns ok immediately on unix domain socket every time and
no error occurred by calling getsockopt(SO_ERROR), but a send() on the fd returns an error says Transport endpoint is not connected. I paste the select() code below.
I think using alarm would meet the case, but seems it's considered as an old-fashion way. So I'm here to see if there's any other solutions for this. Thanks in advance.
if ((flags = fcntl(fd, F_GETFL, 0)) == -1) {
syslog(LOG_USER|LOG_ERR, "fcntl get failed: %s", strerror(errno));
close(fd);
return -1;
}
if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1) {
syslog(LOG_USER|LOG_ERR, "set fd nonblocking failed: %s", strerror(errno));
close(fd);
return -1;
}
if(connect(fd, (struct sockaddr *) &address, sizeof(struct sockaddr_un)) != 0) {
if (errno != EAGAIN && errno != EWOULDBLOCK && errno != EINPROGRESS) {
close(fd);
return -1;
}
FD_ZERO(&set);
FD_SET(fd, &set);
if(select(fd + 1, NULL, &set, NULL, &timeout) <= 0) {
close(fd);
return -1;
}
/*
if(connect(fd, (struct sockaddr *) &address, sizeof(struct sockaddr_un)) != 0) {
close(fd);
return -1;
}
*/
if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &error, (socklen_t *)&len) < 0) {
syslog(LOG_USER|LOG_ERR, "getsockopt failed: %s", strerror(errno));
close(fd);
return -1;
}
if(error != 0) {
syslog(LOG_USER|LOG_ERR, "getsockopt return error: %d", error);
close(fd);
return -1;
}
}
if (fcntl(fd, F_SETFL, flags & ~O_NONBLOCK) == -1) {
syslog(LOG_USER|LOG_ERR, "set fd blocking failed: %s", strerror(errno));
close(fd);
return -1;
}
Somewhere (I did not bookmark that page) in another post I found that the connect() only establishes a TCP connection. It only means that on the other end, there is a working TCP stack, but it does not mean, the server has actually accept()-ed!
The example there was connect() is like calling a support center, and the automatic voice tells you, you are in a queue, but you still cannot communicate. accept() is the actual operator taking your call.
My solution for the same problem will be to have the client wait for the server to actually send something, before moving on with other client-stuff. I can put this in a select-timeout loop.
listen() has a parameter, how many connections can be put in a backlog before starting to drop client connection attempts.
You can use select() or poll() after EINPROGRESS, as described in the connect man page. If you get EAGAIN or EWOULDBLOCK, the Unix domain socket has run out of backlog entries, the queue length specified by the server with the listen() call. The connect() failed.
Note that a connecting client can be able to write to Unix domain sockets until the system buffer is full, before the server even accepted the call. That works for each backlog buffer. Failures occur afterwards.
A failed connect() might need a new socket before retrying. select() might return 0 also if the connection was refused, such as if the server didn't listen(). That depends on system and libray. At any rate, after an error of EAGAIN, it is necessary to retry. For example:
int rtc, so_error, max_retry = 5;
socklen_t len = sizeof so_error;
while ((rtc = connect(fd, (struct sockaddr *)&address, sizeof address)) != 0
&& errno == EAGAIN && --max_retry >= 0) {
sleep(1);
// new socket?
}
if (rtc < 0 && errno != EINPROGRESS) {
syslog(LOG_USER|LOG_ERR, "connect returned %d: %s", rtc, strerror(errno));
close(fd);
return -1;
}
if (rtc < 0)
{
fd_set set, wset, eset;
struct timeval timeout;
timeout.tv_sec = 10;
timeout.tv_usec = 0;
FD_ZERO(&set);
FD_SET(fd, &set);
wset = set;
eset = set;
if(select(fd + 1, &set, &wset, &eset, &timeout) <= 0) {
close(fd);
return -1;
}
// [...]
}
Client
In fact, my client doesn't recv and process data send from server, just connects to my server.
int netif_msg_client_socket_create(char *sockpath)
{
int addrlen, retval;
int sockfd;
struct sockaddr_un serv;
sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
if(sockfd < 0) {
PR_ERROR(NETIF_MSG_M, " fatal failure, client msg socket, error is %s, %s %u\n", strerror(errno), __FILE__, __LINE__);
return -1;
}
/* Make client socket. */
memset (&serv, 0, sizeof (struct sockaddr_un));
serv.sun_family = AF_UNIX;
strncpy (serv.sun_path, sockpath, strlen(sockpath));
addrlen = sizeof (serv.sun_family) + strlen(serv.sun_path);
retval = connect(sockfd, (struct sockaddr*)&serv, addrlen);
if(retval < 0)
{
PR_ERROR(NETIF_MSG_M, " fatal failure, client msg connect, error is %s, %s %u\n", strerror(errno), __FILE__, __LINE__);
close(sockfd);
return -1;
}
fcntl(sockfd, F_SETFL, O_NONBLOCK);
return sockfd;
}
2.Server
But my server will try to send some data to the client continuously.
int netif_msg_server_socket_create(char *sockpath)
{
int addrlen, retval;
int sockfd;
struct sockaddr_un serv;
/* First of all, unlink existing socket */
unlink (sockpath);
sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
if(sockfd < 0)
return -1;
fcntl(sockfd, F_SETFL, O_NONBLOCK);
/* Make server socket. */
memset (&serv, 0, sizeof (struct sockaddr_un));
serv.sun_family = AF_UNIX;
strncpy (serv.sun_path, sockpath, sizeof(serv.sun_path)-1);
addrlen = sizeof (serv.sun_family) + strlen(serv.sun_path);
//printf("sizeof(serv) == %d, addrlen == %d.\r\n", sizeof(serv), addrlen);
retval = bind (sockfd, (struct sockaddr *) &serv, addrlen);
if (retval < 0)
{
close (sockfd); /* Avoid sd leak. */
return -1;
}
retval = listen (sockfd, 20);
if (retval < 0)
{
close (sockfd); /* Avoid sd leak. */
return -1;
}
return sockfd;
}
My server uses select and accepts the connection from my client successfully.
After my server sent 412 packets(96 Bytes each), it seems the server sleeps on send.
Key codes:
printf("Try to send packet(%d bytes) to clientfd %d.\n", MSGCB_DLEN(msgcb), client->acpt_fd);
retval = send(client->acpt_fd, msgcb->data_ptr, MSGCB_DLEN(msgcb), 0);
if(retval != MSGCB_DLEN(msgcb))
{
printf("Send netif notify msg failed[%d].\n", retval);
} else {
printf("Send netif notify msg succeeded.\n");
}
After 412 packets sent to my client and "Try to ..." outputed, nothing goes on, neither "...failed" nor "...succeeded" outputs.
I use getsockopt to fetch the SO_RCVBUF and SO_SNDBUF, there are about 100000Bytes for each of them.
I don't know why, need your help, thanks!
If you want the server socket that is connected to the client to be non-blocking, then you must specifically set the new socket that is returned from accept() to be non-blocking. Your code only sets the listening socket to non-blocking.
You can perform non-blocking I/O with send using the MSG_DONTWAIT flag in the last parameter.
retval = send(client->acpt_fd, msgcb->data_ptr, MSGCB_DLEN(msgcb),
MSG_DONTWAIT);
When performing non-blocking I/O, you need to detect when the return value is signalling you to retry the operation.
if (retval < 0) {
if (errno == EAGAIN) {
/* ... handle retry of send laster when it is ready ... */
} else {
/* ... other error value cases */
}
}
I have some code that just tests if a port is open on a device, for that I made a little timeout socket function:
int timeout_socket(struct sockaddr *addr, int timeout_ms) {
int fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (fd < 0) return 0;
int on = 1;
setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &on, sizeof(int));
//Set the socket for non-blocking I/O
if (ioctl(fd, FIONBIO, (char *)&on) < 0) {
close(fd);
return 0;
}
int result = connect(fd, addr, sizeof(struct sockaddr_in));
if (result != 0 && errno != EINPROGRESS) {
close(fd);
return 0;
}
struct pollfd fds;
fds.fd = fd;
fds.events = POLLOUT;
//Poll for timeout_ms
while (1==1) {
int res = poll(&fds, 1, timeout_ms);
if (res == EINTR) continue;
if (fds.revents & POLLOUT || fds.revents & POLLIN) {
close(fd);
return 1;
}
break;
}
close(fd);
return 0;
}
The problem is that, when the target device (a Mac) is sleeping it wakes up just after the connect method runs, but despite the timeout_ms being something like 10000 (10secs) it just doesn't respond.
My possible fix is:
Connect to the device using a socket/connect
Close it
Open/Connect another socket
Poll for timeout_ms
Is this the only way? This behavior seems strange to me, but I have never used posix sockets with non-blocking before. Is this normal behavior?