I'm trying to force TCP reset on a connection.
The suggested way is to set SO_LINGER to 0 and the call close().
I'm doing exactly that, but the connection remains in ESTABLISHED state.
The socket is operating in non blocking mode. The operating system is Raspbian.
The code:
struct linger l;
l.l_onoff = 1;
l.l_linger = 0;
if (setsockopt(server->connection_socket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)) != 0) {
LOG_E(tcp, "setting SO_LINGER failed");
}
if (close(server->connection_socket) != 0) {
LOG_E(tcp, "closing socket failed");
}
server->connection_socket = 0;
LOG_I(tcp, "current TCP connection was closed");
Wireshark trace shows no RST as well.
No other threads of the application are performing any operation on that socket.
I can't figure out what is wrong, any suggestion would be much appreciated.
SOLVED
The issue was with file descriptors leaking to children created through system() call.
In fact when I listed all TCP socket descriptors with lsof -i tcp, I found out that child processes had opened file descriptor from parent process (even though parent already did not).
The solution was requesting file descriptor to be closed in forked process (right after accept()).
fcntl(server->connection_socket, F_SETFD, FD_CLOEXEC)
In your case you can no longer send and receive data after calling close. Also, after close call the RST will be sent only if the reference counter of socket descriptor becomes 0. Then the connection goes into CLOSED state and data in the receive and the send buffers are discarded.
Probably the answer lies in how you forked a process (as mentioned by EJP in comment). It seems that you didn't close the accepted socket in parent process after calling fork. So the socket reference counter is nonzero and there is no RST immediately after your close.
Such situations are well described by Stevens in UNIX Network Programming.
Related
I have two applications running on Embedded Linux board. One runs as a daemon and other acts as an interface for it. They communicate with each other using Unix sockets.
As to handle any abnormal termination of socket, I tried terminating the interface application [ctr+c]. But as a result, the daemon application crashes. Since the socket is terminated, I get the socket send failed error on daemon side, which is expected but after that the daemon crashes.
I am at a loss as to where exactly should I look for debugging this problem.
Have you set the socket in your daemon to non-blocking mode ?
Suppose your code looks like the following:
while(1)
{
connfd = accept(listenfd, (struct sockaddr*)NULL, NULL);
/* then you use the fd */
func(connfd);
}
Based on the man page:
"
On success, accept() return a nonnegative integer that is a descriptor for the accepted socket. On error, -1 is returned, and errno is set appropriately.
and
If no pending connections are present on the queue, and the socket is not marked as nonblocking, accept() blocks the caller until a connection is present. If the socket is marked nonblocking and no pending connections are present on the queue, accept() fails with the error EAGAIN or EWOULDBLOCK.
"
Therefore, it means if you are in non-blocking mode, you should check the return value of accept() instead of using it directly because the fd value would be -1.
The above is just one common possibility. If it is not the case, you can try to use "sudo strace -p process_id" or carry out the core file analysis to understand why it is crashed.
[I asked something similar before. This is a more focused version.]
What can cause a server's select() call on a TCP socket to consistently time-out rather than "see" the client's close() of the socket? On the client's side, the socket is a regular socket()-created blocking socket that successfully connects to the server and successfully transmits a round-trip transaction. On the server's side, the socket is created via an accept() call, is blocking, is passed to a child server process via fork(), is closed by the top-level server, and is successfully used by the child server process in the initial transaction. When the client subsequently closes the socket, the select() call of the child server process consistently times-out (after 1 minute) rather than indicating a read-ready condition on the socket. The select() call looks for read-ready conditions only: the write-ready and exception arguments are NULL.
Here's the simplified but logically equivalent select()-using code in the child server process:
int one_svc_run(
const int sock,
const unsigned timeout)
{
struct timeval timeo;
fd_set fds;
timeo.tv_sec = timeout;
timeo.tv_usec = 0;
FD_ZERO(&fds);
FD_SET(sock, &fds);
for (;;) {
fd_set readFds = fds;
int status = select(sock+1, &readFds, 0, 0, &timeo);
if (status < 0)
return errno;
if (status == 0)
return ETIMEDOUT;
/* This code not reached when client closes socket */
/* The time-out structure, "timeo", is appropriately reset here */
...
}
...
}
Here's the logical equivalent of the sequence of events on the client-side (error-handling not shown):
struct sockaddr_in *raddr = ...;
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
(void)bindresvport(sock, (struct sockaddr_in *)0);
connect(sock, (struct sockaddr *)raddr, sizeof(*raddr));
/* Send a message to the server and receive a reply */
(void)close(sock);
fork(), exec(), and system() are never called. The code is considerably more complex than this, but this is the sequence of relevant calls.
Could Nagel's algorithm cause the FIN packet to not be sent upon close()?
Most likely explanation is that you're not actually closing the client end of the connection when you think you are. Probably because you have some other file descriptor that references the client socket somewhere that is not being closed.
If your client program ever does a fork (or related calls that fork, such as system or popen), the forked child might well have a copy of the file descriptor which would cause the behavior you're seeing.
One way to test/workaround the problem is to have the client do an explicit shutdown(2) prior to closing the socket:
shutdown(sock, SHUT_RDWR);
close(sock);
If this causes the problem to go away then that is the problem -- you have another copy of the client socket file descriptor somewhere hanging around.
If the problem is due to children getting the socket, the best fix is probably to set the close-on-exec flag on the socket immediately after creating it:
fcntl(sock, F_SETFD, fcntl(sock, F_GETFD) | FD_CLOEXEC);
or on some systems, use the SOCK_CLOEXEC flag to the socket creation call.
Mystery solved.
#nos was correct in the first comment: it's a firewall problem. A shutdown() by the client isn't needed; the client does close the socket; the server does use the right timeout; and there's no bug in the code.
The problem was caused by the firewall rules on our Linux Virtual Server (LVS). A client connects to the LVS and the connection is passed to the least-loaded of several backend servers. All packets from the client pass through the LVS; all packets from the backend server go directly to the client. The firewall rules on the LVS caused the FIN packet from the client to be discarded. Thus, the backend server never saw the close() by the client.
The solution was to remove the "-m state --state NEW" options from the iptables(8) rule on the LVS system. This allows the FIN packets from the client to be forwarded to the backend server. This article has more information.
Thanks to all of you who suggested using wireshark(1).
select() call of Linux will modify value of timeout argument. From man page:
On Linux, select() modifies timeout to reflect the amount of time not
slept
So your timeo will runs to zero. And when it is zero select will return immediately (mostly with return value zero).
The following change may help:
for (;;) {
struct timeval timo = timeo;
fd_set readFds = fds;
int status = select(sock+1, &readFds, 0, 0, &timo);
As described in Beej's Guide to network Programming, select() monitors a set of file descriptors for reading (using recv()), a set of file descriptor for writing (using send()) and the last one, I don't know. When the server socket receives message from client sockets, read_fds set will be modified and select() return from blocking status. It is the same for sending message to client sockets. For example:
for(;;) {
read_fds = master; // copy it
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
perror("select");
exit(4);
}
//the rest is code for processing ready socket
I guess the read_fds set will contain the only ready socket descriptor at this point (the others are removed), and the ready socket descriptor is the new connected socket or message sent from connected socket. Is my understanding correct?
It seems the ready socket must be handled one by one. When I tried to run it on gdb to understand the behavior, while the program was processing the ready socket (the code after select() return), I tried to send some message and connect to the server by some new clients. How can it recognize the new clients or newly sent message, even if select() is not called?
As described in Beej's Guide to network Programming, select() monitors a set of file descriptors for reading (using recv()), a set of file descriptor for writing (using send())
Yes
and the last one, I don't know.
The last one no longer has any useful meaning.
I guess the read_fds set will contain the only ready socket descriptor at this point (the others are removed), and the ready socket descriptor is the new connected socket or message sent from connected socket. Is my understanding correct?
That's correct.
It seems the ready socket must be handled one by one. When I tried to run it on gdb to understand the behavior, while the program was processing the ready socket (the code after select() return), I tried to send some message and connect to the server by some new clients. How can it recognize the new clients or newly sent message, even if select() is not called?
Normally when you create a polling loop such as this, you'd add new sockets to the loop. That is, you'd add them to the appropriate fd_sets before your next call to select.
When the new socket becomes writable, you'd send on it.
When you're dealing with multiple sockets that may potentially block (in your case reading sockets), you need to determine which sockets have data in them waiting to be read. You can do this by calling select() and adding the sockets to your read_set.
For your listening socket, if you call accept() and there is no pending connection, then your accept will block until a new connection arrives. So you also want to select() this socket. Once you accept that client, you will want to add that to your read_set.
e.g. Pseudo-code
for (;;) {
struct timeval tv = { timeout, 0 };
fd_set read_set;
FD_ZERO(&read_set);
FD_SET(listen_sock, &read_set);
max_fd = max(max_fd, listen_sock);
/* add all your other other client sockets to thread read_set */
n = select(max_fd, &read_set, NULL, NULL, tv);
if (n > 0) {
if (FD_ISSET(listen_sock, &read_set)) {
cli = accept(listen_sock);
/* add to list of clients */
}
else {
for (int i = 0; i < max_clients; i++) {
if (FD_ISSET(clients[i], &read_set)) {
/* data is waiting. recv */
bytes = recv(clients[i], ..)
if (bytes <= 0) {
/* error or EOF, remove client list, so we don't select on this anymore */
}
}
}
}
}
Note that sends can also block, if the other end is not actively reading, and the send buffer is full. So if you're sending, you might want to check if it's "sendable".
I'm trying to connect two machines, say machine A and B. I'm trying to send TCP message from A to B (One way). In normal scenario this works fine. When the communication is smooth, if the socket in B is closed, send() from A is stuck forever. And it puts process into Zombie state. I have socket in blocked mode in machine A. Below is the code that stuck forever.
if (send (txSock,&txSockbuf,sizeof(sockstruct),0) == -1) {
printf ("Error in sending the socket Data\n");
}
else {
printf ("The SENT String is %s \n",sock_buf);
}
How do I find if the other side socket is closed?? What does send return if the destination socket is closed?? Would select be helpful.
A process in the "zombie" state means that it has already exited, but its parent has not yet read its return code. What's probably happening is that your process is receiving a SIGPIPE signal (this is what you'll get by default when you write to a closed socket), your program has already terminated, but the zombie state hasn't yet been resolved.
This related question gives more information about SIGPIPE and how to handle it: SIGPIPE, Broken pipe
If I got a file descriptor (socket fd), how to check this fd is avaiable for read/write?
In my situation, the client has connected to server and we know the fd.
However, the server will disconnect the socket, are there any clues to check it ?
You want fcntl() to check for read/write settings on the fd:
#include <unistd.h>
#include <fcntl.h>
int r;
r = fcntl(fd, F_GETFL);
if (r == -1)
/* Error */
if (r & O_RDONLY)
/* Read Only */
else if (r & O_WRONLY)
/* Write Only */
else if (r & O_RDWR)
/* Read/Write */
But this is a separate issue from when the socket is no longer connected. If you are already using select() or poll() then you're almost there. poll() will return status nicely if you specify POLLERR in events and check for it in revents.
If you're doing normal blocking I/O then just handle the read/write errors as they come in and recover gracefully.
You can use select() or poll() for this.
In C#, this question is answered here
In general, socket disconnect is asynchronous and needs to be polled for in some manner. An async read on the socket will typically return if it's closed as well, giving you a chance to pick up on the status change quicker. Winsock (Windows) has the ability to register to receive notification of a disconnect (but again, this may not happen for a long time after the other side "goes away", unless you use some type of 'keepalive' (SO_KEEPALIVE, which by default may not notice for hours, or an application-level heartbeat).
I found the recv can check. when socket fd is bad, some errno is set.
ret = recv(socket_fd, buffer, bufferSize, MSG_PEEK);
if(EPIPE == errno){
// something wrong
}
Well, you could call select(). If the server has disconnected, I believe you'll eventually get an error code returned... If not, you can use select() to tell whether you're network stack is ready to send more data (or receive it).