When a process runs out of file descriptors, accept() will fail and set errno to EMFILE.
However the underlying connection that would have been accepted are not closed, so there appears to be no way to inform the client that the application code could not handle the connection.
The question is what is the proper action to take regarding accepting TCP connections when running out of file descriptors.
The following code demonstrates the issue that I want to learn how to best deal with(note this is just example code for demonstrating the issue/question, not production code)
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
static void err(const char *str)
{
perror(str);
exit(1);
}
int main(int argc,char *argv[])
{
int serversocket;
struct sockaddr_in serv_addr;
serversocket = socket(AF_INET,SOCK_STREAM,0);
if(serversocket < 0)
err("socket()");
memset(&serv_addr,0,sizeof serv_addr);
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr= INADDR_ANY;
serv_addr.sin_port = htons(6543);
if(bind(serversocket,(struct sockaddr*)&serv_addr,sizeof serv_addr) < 0)
err("bind()");
if(listen(serversocket,10) < 0)
err("listen()");
for(;;) {
struct sockaddr_storage client_addr;
socklen_t client_len = sizeof client_addr;
int clientfd;
clientfd = accept(serversocket,(struct sockaddr*)&client_addr,&client_len);
if(clientfd < 0) {
continue;
}
}
return 0;
}
Compile and run this code with a limited number of file descriptors available:
gcc srv.c
ulimit -n 10
strace -t ./a.out 2>&1 |less
And in another console, I run
telnet localhost 65432 &
As many times as needed until accept() fails:
The output from strace shows this to happen:
13:21:12 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
13:21:12 bind(3, {sa_family=AF_INET, sin_port=htons(6543), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
13:21:12 listen(3, 10) = 0
13:21:12 accept(3, {sa_family=AF_INET, sin_port=htons(43630), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 4
13:21:19 accept(3, {sa_family=AF_INET, sin_port=htons(43634), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 5
13:21:22 accept(3, {sa_family=AF_INET, sin_port=htons(43638), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 6
13:21:23 accept(3, {sa_family=AF_INET, sin_port=htons(43642), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 7
13:21:24 accept(3, {sa_family=AF_INET, sin_port=htons(43646), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 8
13:21:26 accept(3, {sa_family=AF_INET, sin_port=htons(43650), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 9
13:21:27 accept(3, 0xbfe718f4, [128]) = -1 EMFILE (Too many open files)
13:21:27 accept(3, 0xbfe718f4, [128]) = -1 EMFILE (Too many open files)
13:21:27 accept(3, 0xbfe718f4, [128]) = -1 EMFILE (Too many open files)
13:21:27 accept(3, 0xbfe718f4, [128]) = -1 EMFILE (Too many open files)
... and thousands upon thousands of more accept() failures.
Basically at this point:
the code will call accept() as fast as possible failing to accept the same TCP connection over and over again, churning CPU.
the client will stay connected, (as the TCP handshake completes before the application accepts the connection) and the client gets no information that there is an issue.
So,
Is there a way to force the TCP connection that caused accept() to fail to be closed (so e.g. the client can be quickly informed and perhaps try another server )
What is the est practice to prevent the server code to go into an infinite loop when this situation arises (or to prevent the situation altogether)
You can set aside an extra fd at the beginning of your program and keep track of the EMFILE condition:
int reserve_fd;
_Bool out_of_fd = 0;
if(0>(reserve_fd = dup(1)))
err("dup()");
Then, if you hit the EMFILE condition, you can close the reserve_fd and use its slot to accept the new connection (which you'll then immediately close):
clientfd = accept(serversocket,(struct sockaddr*)&client_addr,&client_len);
if (out_of_fd){
close(clientfd);
if(0>(reserve_fd = dup(1)))
err("dup()");
out_of_fd=0;
continue; /*doing other stuff that'll hopefully free the fd*/
}
if(clientfd < 0) {
close(reserve_fd);
out_of_fd=1;
continue;
}
Complete example:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
static void err(const char *str)
{
perror(str);
exit(1);
}
int main(int argc,char *argv[])
{
int serversocket;
struct sockaddr_in serv_addr;
serversocket = socket(AF_INET,SOCK_STREAM,0);
if(serversocket < 0)
err("socket()");
int yes;
if ( -1 == setsockopt(serversocket, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) )
perror("setsockopt");
memset(&serv_addr,0,sizeof serv_addr);
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr= INADDR_ANY;
serv_addr.sin_port = htons(6543);
if(bind(serversocket,(struct sockaddr*)&serv_addr,sizeof serv_addr) < 0)
err("bind()");
if(listen(serversocket,10) < 0)
err("listen()");
int reserve_fd;
int out_of_fd = 0;
if(0>(reserve_fd = dup(1)))
err("dup()");
for(;;) {
struct sockaddr_storage client_addr;
socklen_t client_len = sizeof client_addr;
int clientfd;
clientfd = accept(serversocket,(struct sockaddr*)&client_addr,&client_len);
if (out_of_fd){
close(clientfd);
if(0>(reserve_fd = dup(1)))
err("dup()");
out_of_fd=0;
continue; /*doing other stuff that'll hopefully free the fd*/
}
if(clientfd < 0) {
close(reserve_fd);
out_of_fd=1;
continue;
}
}
return 0;
}
If you're multithreaded, then I imagine you'd need a lock around fd-producing functions and take it when you close the extra fd (while expecting to accept the final connection) in order to prevent having the spare slot filled by another thread.
All this should only makes sense if 1) the listening socket isn't shared with other processes (which might not have hit their EMFILE limit yet) and 2) the server deals with persistent connections (because if it doesn't, then you're bound to close some existing connection very soon, freeing up a fd slot for your next attempt at accept).
Problem
You cannot accept client connections, if the maximum number of file descriptors is reached. This can be a process limit (errno EMFILE) or a global system limit (errno ENFILE). The client does not immediately notice this situation and it looks to him like the connection was accepted by the server. If too many such connections pile up on the socket (when the backlog runs full), the server will stop sending syn-ack packets and the connection request will time out at the client (which can be quite an annoying delay)
Number of file descriptors
It is of course possible, to extend both limits when they are hit. For the process wide limit, use setrlimit(RLIMIT_NOFILE, ...), for the system wide limit sysctl() is the command to call. Both may require root privileges, the first one only to rise the hard limit.
However, there usually is a good reason for the file descriptor limit to prevent overusage of system resources, so this will not be a solution for all situations.
Recovering from EMFILE
One option is to implement a sleep(n) after EMFILE is received, one second should be enough to prevent additional system load by calling accept() too often. This may be useful to handle short bursts of connections.
However, if the situation doesn't normalize soon, other measures should be taken (for example, if sleep() had to be called 5 times in a row or similar).
In this case it is advisable to close the server socket. All pending client connections will be terminated immediately (by receiving a RST packet) and the clients can use another server if applicable. Furthermore, no new client connections are accepted, but immediately rejected (Connection Refused) instead of timing out as it might happen when the socket is held open.
After the contention releases, the server socket can be opened again. For the EMFILE case it is only necessary to track the number of open client connections and re-open the server socket, when these fall below some threshold. In the system-wide case, there is not a general answer for that, maybe just try after some time or use the /proc filesystem or system tools like lsof to find out when the contention ceases.
One solution I've read about is to keep a "spare" file descriptor handy that you can use to accept and immediately close new connections when you're over fd capacity. For example:
int sparefd = open("/dev/null", O_RDONLY);
Then, when accept returns with EMFILE, you can:
close(sparefd); // create an available file descriptor
int newfd = accept(...); // accept a new connection
close(newfd); // immediately close the connection
sparefd = open("/dev/null", O_RDONLY); // re-create spare
It's not exactly elegant, but it's probably a little better than closing the listening socket in some circumstances. Be wary that if your program is multi-threaded then another thread might "claim" the spare fd as soon as you release it; there's no easy way to solve that (the "hard" way is to put a mutex around every operation that might consume a file descriptor).
Related
I'm learning Unix Network Programming Volume 1, I want to reproduce the accept error for RST in Linux.
server: call socket(), bind(), listen(), and sleep(10)
client: call socket(), connect(), setsockopt() of LINGER, close() and return
server: call accept()
I think that the 3rd steps will get an error like ECONNABORTED, but not.
Do I want to know why?
I will appreciate it if you help me.
The follow is server code :
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <stdio.h>
#include <strings.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
struct sockaddr_in addr;
bzero(&addr, sizeof addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(6666);
inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
bind(sock, (struct sockaddr*)(&addr), (socklen_t)(sizeof addr));
listen(sock, 5);
sleep(10);
if (accept(sock, NULL, NULL) < 0)
perror("error");
else
printf("right");
return 0;
}
The following is the client code
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <stdio.h>
#include <strings.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
struct sockaddr_in addr;
bzero(&addr, sizeof addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(6666);
inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
connect(sock, (struct sockaddr*)(&addr), (socklen_t)(sizeof addr));
struct linger ling;
ling.l_onoff = 1;
ling.l_linger = 0;
setsockopt(sock, SOL_SOCKET, SO_LINGER, &ling, sizeof ling);
close(sock);
return 0;
}
Nope. I think you'll get an empty, but complete connection (with no data). The kernel will manage the complete connection establishment and then it'll get an immediate FIN packet (meaning EOF, not reset) and will handle it (or wait for user space process to close its side, to send the FIN to the other side) For a connection abort you need to reboot the client machine (or the server) without allowing it to send the FIN packets (or disconnecting it from the network before rebooting it) An ACK is never answered, so you won't get a RST sent from an ACK.
RST packets are sent automatically by the kernel when some state mismatch is in between two parties. For this to happen in a correct implementation you must force such a state mismatch (this is why the machine reboot is necessary)
Make a connection between both parties and stop it (with a sleep) to ensure the connection is in the ESTABLISHED state before disconnecting the cable.
disconnect physically one of the peers from the network, so you don't allow its traffic to go to the network.
reboot the machine, so all sockets are in the IDLE state.
reconnect the cable. As soon as the waiting machine gets out of the sleep and begins sending packets again, it will receive a RST segment from the other side, because it has been rebooted and TCP does not know about that connection.
Other ways of getting a RST segment involve bad implementations of TCP, or mangling the packets in transit (changing the sender or receiver sequence numbers in transit)
The purpose of RST packets is not to add functionality to TCP, but to detect misbehaviours, to there should be no means to get a reset with proper use of sockets. Listen syscall is there to allow you to reserve resources in kernel space to allow the user space process to prepare to handle the connection while the clients are trying to connect. If you do what you intend you'll get a connection with no data, but valid connection, SO_LINGER is there to force a loss of status when machines don't have the time to send the packets to each other... but being connected, the whole connection is handled in the kernel and no abort is to be expected.
Linux accept() (and accept4()) passes already-pending network errors
on the new socket as an error code from accept(). This behavior
differs from other BSD socket implementations. For reliable
operation the application should detect the network errors defined
for the protocol after accept() and treat them like EAGAIN by
retrying. In the case of TCP/IP, these are ENETDOWN, EPROTO,
ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and
ENETUNREACH.
http://man7.org/linux/man-pages/man2/accept.2.html
In a simple program where I'm trying to send command-line inputs from client to server, I keep getting a "Broken Pipe" for the server side. I send a string to the server and the server returns the string as lower-case to the client.
Server:
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include<string.h>
#include <ctype.h>
#include <unistd.h>
int main()
{
char str[100];
int listen_fd, comm_fd;
struct sockaddr_in servaddr;
listen_fd = socket(AF_INET, SOCK_STREAM, 0);
bzero( &servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htons(INADDR_ANY);
servaddr.sin_port = htons(37892);
bind(listen_fd, (struct sockaddr *) &servaddr, sizeof(servaddr));
listen(listen_fd, 10);
comm_fd = accept(listen_fd, (struct sockaddr*) NULL, NULL);
while(1){
bzero( str, 100);
read(comm_fd,str,100);
for(int i = 0; i < strlen(str); i++){
str[i] = tolower(str[i]);
}
printf("Echoing back - %s",str);
write(comm_fd, str, strlen(str)+1);
}
}
Client
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include<string.h>
#include<ctype.h>
#include <unistd.h>
int main(int argc,char **argv)
{
int sockfd,n;
char sendline[100];
char recvline[100];
struct sockaddr_in servaddr;
sockfd=socket(AF_INET,SOCK_STREAM,0);
bzero(&servaddr,sizeof servaddr);
servaddr.sin_family=AF_INET;
servaddr.sin_port=htons(37892);
inet_pton(AF_INET,"127.0.0.1",&(servaddr.sin_addr));
connect(sockfd,(struct sockaddr *)&servaddr,sizeof(servaddr));
if(argc==1) printf("\nNo arguments");
if (1){
{
bzero( sendline, 100);
bzero( recvline, 100);
strcpy(sendline, argv[1]);
write(sockfd,sendline,strlen(sendline)+1);
read(sockfd,recvline,100);
printf("%s",recvline);
}
}
}
The problem I found was that when the client's side is done sending the string, the command line input does not work like fgets() where the loop will wait for another user input. If I change the if(1) in the client's side to a while(1), it will obviously run an infinite loop as no new inputs are being added.
The dilemma is, how would I be able to keep the server's side running to continuously return the string to the client while processing single requests from the command line on the client's side?
Your program has two problems:
1) read() works differently than you think:
Normally read() will read up to a certain number of bytes from some file or stream (e.g. socket).
Because read() does not distinguish between different types of bytes (e.g. letters, the end-of-line marker or even the NUL byte) read() will not work like fgets() (reading line-wise).
read() is also allowed to "split" the data: If you do a write(..."Hello\n"...) on the client the server may receive "Hel" the first time you call read() and the next time it receives "lo\n".
And of course read() can concatenate data: Call write(..."Hello\n"...) and write(..."World\n"...) on the client and one single read() call may receive "Hello\nWorld\n".
And of course both effects may appear at the same time and you have to call read() three times receiving "Hel", "lo\nWo" and "rld\n".
TTYs (= the console (keyboard) and serial ports) have a special feature (which may be switched off) that makes the read() call behave like fgets(). However only TTYs have such a feature!
In the case of sockets read() will always wait for at least one byte to be received and return the (positive) number of bytes received as long as the connection is alive. As soon as read() returns zero or a negative value the connection has been dropped.
You have to use a while loop that processes data until the connection has been dropped.
You'll have to check the data received by read() if it contains the NUL byte to detect the "end" of the data - if "your" data is terminated by a NUL byte.
2) As soon as the client drops the connection the handle returned by accept() is useless.
You should close that handle to save memory and file descriptors (there is a limit on how many file descriptors you can have open at one time).
Then you have to call accept() again to wait for the client to establish a new connection.
Your client sends one request and reads one response.
It then exits without closing the socket.
Your server runs in a loop reading requests and sending responses.
Your server ignores end of stream.
Little or none of this code is error-checked.
I'm writing a TCP server in C and find something unusual happens once the listening fd get "Too many open files" error. The accept call doesn't block anymore and returns -1 all the time.
I also tried closing the listening fd and re-opening, re-binding it, but didn't seem to work.
My questions are why accept keeps returning -1 in this situation, what am I supposed to do to stop it and make the server be able to accept new connections after any old clients closed? (the socket is of course able to accept correctly again when some connections closed)
====== UPDATE: clarification ======
The problem occurs just because the number of active clients is more than the limit of open fds, so I don't close any of the accepted fds in the sample code, just to make it reproduce more quickly.
I add the timestamp each time accept returns to the output and slow down connect frequency to once in 2 seconds, then I find that in fact the "Too many open files" error occurs immediately after the lastest success accept. So I think that is because when the maxium fds is reached, each call to accept will return immediately, and the return value is -1. (What I thought is that accept would still block, but returns -1 at the next incoming connect. The behavior of accept in this situation is my own theory, not from the man page. If it's wrong, please let me know).
So to my second question, to make it stop, I think it's a solution that stop to call accept before any connection is closed.
Also update the sample codes. Thanks for your help.
====== Sample codes ======
Here is how I test it. First set ulimit -n to a low value (like 16) and run the server program compiled from the following C source; then use the Python script to create several connections
/* TCP server; bind :5555 */
#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define BUFSIZE 1024
#define PORT 5555
void error(char const* msg)
{
perror(msg);
exit(1);
}
int listen_port(int port)
{
int parentfd; /* parent socket */
struct sockaddr_in serveraddr; /* server's addr */
int optval; /* flag value for setsockopt */
parentfd = socket(AF_INET, SOCK_STREAM, 0);
if (parentfd < 0) {
error("ERROR opening socket");
}
optval = 1;
setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR,
(const void *)&optval , sizeof(int));
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
serveraddr.sin_port = htons((unsigned short)port);
if (bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr)) < 0) {
error("ERROR on binding");
}
if (listen(parentfd, 5) < 0) {
error("ERROR on listen");
}
printf("Listen :%d\n", port);
return parentfd;
}
int main(int argc, char **argv)
{
int parentfd; /* parent socket */
int childfd; /* child socket */
int clientlen; /* byte size of client's address */
struct sockaddr_in clientaddr; /* client addr */
int accept_count; /* times of accept called */
accept_count = 0;
parentfd = listen_port(PORT);
clientlen = sizeof(clientaddr);
while (1) {
childfd = accept(parentfd, (struct sockaddr *) &clientaddr, (socklen_t*) &clientlen);
printf("accept returns ; count=%d ; time=%u ; fd=%d\n", accept_count++, (unsigned) time(NULL), childfd);
if (childfd < 0) {
perror("error on accept");
/* the following 2 lines try to close the listening fd and re-open it */
// close(parentfd);
// parentfd = listen_port(PORT);
// the following line let the program exit at the first error
error("--- error on accept");
}
}
}
The Python program to create connections
import time
import socket
def connect(host, port):
s = socket.socket()
s.connect((host, port))
return s
if __name__ == '__main__':
socks = []
try:
try:
for i in xrange(100):
socks.append(connect('127.0.0.1', 5555))
print ('connect count: ' + str(i))
time.sleep(2)
except IOError as e:
print ('error: ' + str(e))
print ('stop')
while True:
time.sleep(10)
except KeyboardInterrupt:
for s in socks:
s.close()
why accept keeps returning -1 in this situation
Because you've run out of file descriptors, just like the error message says.
what am I supposed to do to stop it and make the server be able to accept new connections after any old clients closed?
Close the clients. The problem is not accept() returning -1, it is that you aren't closing accepted sockets once you're finished with them.
Closing the listening socket isn't a solution. It's just another problem.
EDIT By 'finished with them' I mean one of several things:
They have finished with you, which is shown by recv() returning zero.
You have finished with them, e.g. after sending a final response.
When you've had an error sending or receiving to/from them other than EAGAIN/EWOULDBLOCK.
When you've had some other internal fatal error that prevents you dealing further with that client, for example receiving an unparseable request, or some other fatal application error that invalidates the connection or the session, or the entire client for that matter.
In all these cases you should close the accepted socket.
The answer of EJP is correct, but it does not tell you how to deal with the situation. What you have to do is actually do something with the sockets that you get as accept returns. Simple calling close on them you won't receive anything of course but it would deal with the resource depletion problem. What you have to do to have a correct implementation is start receiving on the accepted sockets and keep receiving until you receive 0 bytes. If you receive 0 bytes, that is an indication that the peer is done using his side of the socket. That is your trigger to call close on the socket as well and deal with the resource problem.
You don't have to stop listening. That would stop your server from being able to process new requests and that is not the problem here.
The solution I implemented here was to review the value of the new (accepted) fd and if that value was equal or higher then the allowed server capacity, then a "busy" message is sent and the new connection is closed.
This solution is quite effective and allows you to inform your clients about the server's status.
I've encountered a case where using write() server-side on a remotely closed client doesn't return 0.
According to man 2 write :
On success, the number of bytes written is returned (zero indicates
nothing was written). On error, -1 is returned, and errno is set
appropriately.
From my understanding: when using read/write on a remotely closed socket, the first attempt is supposed to fail (thus return 0), and the next try should trigger a broken pipe. But it doesn't. write() acts as if it succeeded in sending the data on the first attempt, and then i get a broken pipe on the next try.
My question is why?
I know how to handle a broken pipe properly, that's not the issue. I'm just trying to understand why write doesn't return 0 in this case.
Below is the server code I wrote. Client-side, I tried a basic C client (with close() and shutdown() for closing the socket) and netcat. All three gave me the same result.
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#define MY_STR "hello world!"
int start_server(int port)
{
int fd;
struct sockaddr_in sin;
fd = socket(AF_INET, SOCK_STREAM, 0);
if (fd == -1)
{
perror(NULL);
return (-1);
}
memset(&sin, 0, sizeof(struct sockaddr_in));
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_family = AF_INET;
sin.sin_port = htons(port);
if (bind(fd, (struct sockaddr *)&sin, sizeof(struct sockaddr)) == -1
|| listen(fd, 0) == -1)
{
perror(NULL);
close(fd);
return (-1);
}
return (fd);
}
int accept_client(int fd)
{
int client_fd;
struct sockaddr_in client_sin;
socklen_t client_addrlen;
client_addrlen = sizeof(struct sockaddr_in);
client_fd = accept(fd, (struct sockaddr *)&client_sin, &client_addrlen);
if (client_fd == -1)
return (-1);
return (client_fd);
}
int main(int argc, char **argv)
{
int fd, fd_client;
int port;
int ret;
port = 1234;
if (argc == 2)
port = atoi(argv[1]);
fd = start_server(port);
if (fd == -1)
return (EXIT_FAILURE);
printf("Server listening on port %d\n", port);
fd_client = accept_client(fd);
if (fd_client == -1)
{
close(fd);
printf("Failed to accept a client\n");
return (EXIT_FAILURE);
}
printf("Client connected!\n");
while (1)
{
getchar();
ret = write(fd_client, MY_STR, strlen(MY_STR));
printf("%d\n", ret);
if (ret < 1)
break ;
}
printf("the end.\n");
return (0);
}
The only way to make write return zero on a socket is to ask it to write zero bytes. If there's an error on the socket you will always get -1.
If you want to get a "connection closed" indicator, you need to use read which will return 0 for a remotely closed connection.
This is just how the sockets interface was written. When you have a connected socket or pipe, you are supposed to close the transmitting end first, and then the receiving end will get EOF and can shut down. Closing the receiving end first is "unexpected" and so it returns an error instead of returning 0.
This is important for pipes, because it allows complicated commands to finish much more quickly than they would otherwise. For example,
bunzip2 < big_file.bz2 | head -n 10
Suppose big_file.bz2 is huge. Only the first part will be read, because bunzip2 will get killed once it tries sending more data to head. This makes the whole command finish much quicker, and with less CPU usage.
Sockets inherited the same behavior, with the added complication that you have to close the transmitting and receiving parts of the socket separately.
The point to be observed is that, in TCP, when one side of the connection closes its
socket, it is actually ceasing to transmit on that socket; it sends a packet to
inform its remote peer that it will not transmit anymore through that
connection. It doesn't mean, however, that it stopped receiving too. (To
continue receiving is a local decision of the closing side; if it stops receiving, it can
lose packets transmitted by the remote peer.)
So, when you write() to a socket that is remotely closed, but
not locally closed, you can't know if the other end is still waiting to read
more packets, and so the TCP stack will buffer your data and try to send it. As
stated in send() manual page,
No indication of failure to deliver is implicit in a send(). Locally detected
errors are indicated by a return value of -1.
(When you write() to a socket, you are actually send()ing to it.)
When you write() a second time, though, and the remote peer has definitely
closed the socket (not only shutdown() writing), the local TCP stack has probably
already received a reset packet from the peer informing it about the error on
the last transmitted packet. Only then can write() return an error, telling
its user that this pipe is broken (EPIPE error code).
If the remote peer has only shutdown() writing, but still has the socket open,
its TCP stack will successfully receive the packet and will acknowledge the
received data back to the sender.
if you read the whole man page then you would read, in error return values:
"EPIPE fd is connected to a pipe or *socket whose reading end is closed*."
So, the call to write() will not return a 0 but rather -1 and errno will be set to 'EPIPE'
I have an application that reads large files from a server and hangs frequently on a particular machine. It has worked successfully under RHEL5.2 for a long time. We have recently upgraded to RHEL6.1 and it now hangs regularly.
I have created a test app that reproduces the problem. It hangs approx 98 times out of 100.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/param.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <sys/time.h>
int mFD = 0;
void open_socket()
{
struct addrinfo hints, *res;
memset(&hints, 0, sizeof(hints));
hints.ai_socktype = SOCK_STREAM;
hints.ai_family = AF_INET;
if (getaddrinfo("localhost", "60000", &hints, &res) != 0)
{
fprintf(stderr, "Exit %d\n", __LINE__);
exit(1);
}
mFD = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (mFD == -1)
{
fprintf(stderr, "Exit %d\n", __LINE__);
exit(1);
}
if (connect(mFD, res->ai_addr, res->ai_addrlen) < 0)
{
fprintf(stderr, "Exit %d\n", __LINE__);
exit(1);
}
freeaddrinfo(res);
}
void read_message(int size, void* data)
{
int bytesLeft = size;
int numRd = 0;
while (bytesLeft != 0)
{
fprintf(stderr, "reading %d bytes\n", bytesLeft);
/* Replacing MSG_WAITALL with 0 works fine */
int num = recv(mFD, data, bytesLeft, MSG_WAITALL);
if (num == 0)
{
break;
}
else if (num < 0 && errno != EINTR)
{
fprintf(stderr, "Exit %d\n", __LINE__);
exit(1);
}
else if (num > 0)
{
numRd += num;
data += num;
bytesLeft -= num;
fprintf(stderr, "read %d bytes - remaining = %d\n", num, bytesLeft);
}
}
fprintf(stderr, "read total of %d bytes\n", numRd);
}
int main(int argc, char **argv)
{
open_socket();
uint32_t raw_len = atoi(argv[1]);
char raw[raw_len];
read_message(raw_len, raw);
return 0;
}
Some notes from my testing:
If "localhost" maps to the loopback address 127.0.0.1, the app hangs on the call to recv() and NEVER returns.
If "localhost" maps to the ip of the machine, thus routing the packets via the ethernet interface, the app completes successfully.
When I experience a hang, the server sends a "TCP Window Full" message, and the client responds with a "TCP ZeroWindow" message (see image and attached tcpdump capture). From this point, it hangs forever with the server sending keep-alives and the client sending ZeroWindow messages. The client never seems to expand its window, allowing the transfer to complete.
During the hang, if I examine the output of "netstat -a", there is data in the servers send queue but the clients receive queue is empty.
If I remove the MSG_WAITALL flag from the recv() call, the app completes successfully.
The hanging issue only arises using the loopback interface on 1 particular machine. I suspect this may all be related to timing dependencies.
As I drop the size of the 'file', the likelihood of the hang occurring is reduced
The source for the test app can be found here:
Socket test source
The tcpdump capture from the loopback interface can be found here:
tcpdump capture
I reproduce the issue by issuing the following commands:
> gcc socket_test.c -o socket_test
> perl -e 'for (1..6000000){ print "a" }' | nc -l 60000
> ./socket_test 6000000
This sees 6000000 bytes sent to the test app which tries to read the data using a single call to recv().
I would love to hear any suggestions on what I might be doing wrong or any further ways to debug the issue.
MSG_WAITALL should block until all data has been received. From the manual page on recv:
This flag requests that the operation block until the full request is satisfied.
However, the buffers in the network stack probably are not large enough to contain everything, which is the reason for the error messages on the server. The client network stack simply can't hold that much data.
The solution is either to increase the buffer sizes (SO_RCVBUF option to setsockopt), split the message into smaller pieces, or receiving smaller chunks putting it into your own buffer. The last is what I would recommend.
Edit: I see in your code that you already do what I suggested (read smaller chunks with own buffering,) so just remove the MSG_WAITALL flag and it should work.
Oh, and when recv returns zero, that means the other end have closed the connection, and that you should do it too.
Consider these two possible rules:
The receiver may wait for the sender to send more before receiving what has already been sent.
The sender may wait for the receiver to receive what has already been sent before sending more.
We can have either of these rules, but we cannot have both of these rules.
Why? Because if the receiver is permitted to wait for the sender, that means the sender cannot wait for the receiver to receive before sending more, otherwise we deadlock. And if the sender is permitted to wait for the receiver, that means the receiver cannot wait for the sender to send before receiving more, otherwise we deadlock.
If both of these things happen at the same time, we deadlock. The sender will not send more until the receiver receives what has already been sent, and the receiver will not receive what has already been sent unless the sender send more. Boom.
TCP chooses rule 2 (for reasons that should be obvious). Thus it cannot support rule 1. But in your code, you are the receiver, and you are waiting for the sender to send more before you receive what has already been sent. So this will deadlock.