This question already has answers here:
Detecting TCP Client Disconnect
(9 answers)
Closed 4 years ago.
I have basic tcp application written in C. It basically sends data to a tcp server. I have connected two PC's with cross cable. I send data from one, and successfully get this data from another one. I have built this mechanism to test If somehow connection broken by unhealty ways (ruptured cable etc.), I want to be informed as client. But things doesn't work as I wanted.If I manually stop tcpserver, client side is informed, but when I start program, connection establishes, data starts to flow, then I unplug the cable, and both sides behaves like nothing happened. Client still sends data with no error, and server still shows the client connected but data flow stops. After a few minutes, I plug cable again, the datas -which considered as sent but not sent- flushes suddenly then program continues normally. How can I detect a broken connection like this? Any help would be appreciated. Here is the code;
#include <arpa/inet.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
int main() {
const char* server_name = "192.168.5.2";
const int server_port = 30152;
struct sockaddr_in server_address;
memset(&server_address, 0, sizeof(server_address));
server_address.sin_family = AF_INET;
// creates binary representation of server name
// and stores it as sin_addr
// http://beej.us/guide/bgnet/output/html/multipage/inet_ntopman.html
inet_pton(AF_INET, server_name, &server_address.sin_addr);
// htons: port in network order format
server_address.sin_port = htons(server_port);
// open a stream socket
int sock;
if ((sock = socket(PF_INET, SOCK_STREAM, 0)) < 0) {
printf("could not create socket\n");
return 1;
}
// TCP is connection oriented, a reliable connection
// **must** be established before any data is exchanged
if (connect(sock, (struct sockaddr*)&server_address,
sizeof(server_address)) < 0) {
printf("could not connect to server\n");
return 1;
}
// send
// data that will be sent to the server
const char* data_to_send = "HELLO THIS IS DATA!";
while(1)
{
int err = send(sock, data_to_send, strlen(data_to_send), 0);
if(err==-1)
{
printf("ERROR \n");
break;
}
else
{
printf("sent \n");
sleep(1);
}
}
printf("EOP\n");
// close the socket
close(sock);
return 0;
}
If the peer of a TCP connection closes the connection, it will lead to a recv call on your end to return 0. That's the way to detect closed (but not broken) connections.
If you don't currently receive anything from the peer, you need to make up a protocol on top of TCP which includes receiving data.
Furthermore, sending might not detect broken connections (like missing cables etc.) directly, as there are a lot of retransmissions and timeouts. The best way is again to implement some kind of protocol overlaying TCP, one that for example contains a kind of "are you there" message which expects a reply. If a reply to the "are you there" message isn't received within some specific timeout, then consider the connection broken and disconnect.
Related
I'm learning Unix Network Programming Volume 1, I want to reproduce the accept error for RST in Linux.
server: call socket(), bind(), listen(), and sleep(10)
client: call socket(), connect(), setsockopt() of LINGER, close() and return
server: call accept()
I think that the 3rd steps will get an error like ECONNABORTED, but not.
Do I want to know why?
I will appreciate it if you help me.
The follow is server code :
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <stdio.h>
#include <strings.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
struct sockaddr_in addr;
bzero(&addr, sizeof addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(6666);
inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
bind(sock, (struct sockaddr*)(&addr), (socklen_t)(sizeof addr));
listen(sock, 5);
sleep(10);
if (accept(sock, NULL, NULL) < 0)
perror("error");
else
printf("right");
return 0;
}
The following is the client code
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <stdio.h>
#include <strings.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
struct sockaddr_in addr;
bzero(&addr, sizeof addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(6666);
inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
connect(sock, (struct sockaddr*)(&addr), (socklen_t)(sizeof addr));
struct linger ling;
ling.l_onoff = 1;
ling.l_linger = 0;
setsockopt(sock, SOL_SOCKET, SO_LINGER, &ling, sizeof ling);
close(sock);
return 0;
}
Nope. I think you'll get an empty, but complete connection (with no data). The kernel will manage the complete connection establishment and then it'll get an immediate FIN packet (meaning EOF, not reset) and will handle it (or wait for user space process to close its side, to send the FIN to the other side) For a connection abort you need to reboot the client machine (or the server) without allowing it to send the FIN packets (or disconnecting it from the network before rebooting it) An ACK is never answered, so you won't get a RST sent from an ACK.
RST packets are sent automatically by the kernel when some state mismatch is in between two parties. For this to happen in a correct implementation you must force such a state mismatch (this is why the machine reboot is necessary)
Make a connection between both parties and stop it (with a sleep) to ensure the connection is in the ESTABLISHED state before disconnecting the cable.
disconnect physically one of the peers from the network, so you don't allow its traffic to go to the network.
reboot the machine, so all sockets are in the IDLE state.
reconnect the cable. As soon as the waiting machine gets out of the sleep and begins sending packets again, it will receive a RST segment from the other side, because it has been rebooted and TCP does not know about that connection.
Other ways of getting a RST segment involve bad implementations of TCP, or mangling the packets in transit (changing the sender or receiver sequence numbers in transit)
The purpose of RST packets is not to add functionality to TCP, but to detect misbehaviours, to there should be no means to get a reset with proper use of sockets. Listen syscall is there to allow you to reserve resources in kernel space to allow the user space process to prepare to handle the connection while the clients are trying to connect. If you do what you intend you'll get a connection with no data, but valid connection, SO_LINGER is there to force a loss of status when machines don't have the time to send the packets to each other... but being connected, the whole connection is handled in the kernel and no abort is to be expected.
Linux accept() (and accept4()) passes already-pending network errors
on the new socket as an error code from accept(). This behavior
differs from other BSD socket implementations. For reliable
operation the application should detect the network errors defined
for the protocol after accept() and treat them like EAGAIN by
retrying. In the case of TCP/IP, these are ENETDOWN, EPROTO,
ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and
ENETUNREACH.
http://man7.org/linux/man-pages/man2/accept.2.html
I'm writing a TCP server in C and find something unusual happens once the listening fd get "Too many open files" error. The accept call doesn't block anymore and returns -1 all the time.
I also tried closing the listening fd and re-opening, re-binding it, but didn't seem to work.
My questions are why accept keeps returning -1 in this situation, what am I supposed to do to stop it and make the server be able to accept new connections after any old clients closed? (the socket is of course able to accept correctly again when some connections closed)
====== UPDATE: clarification ======
The problem occurs just because the number of active clients is more than the limit of open fds, so I don't close any of the accepted fds in the sample code, just to make it reproduce more quickly.
I add the timestamp each time accept returns to the output and slow down connect frequency to once in 2 seconds, then I find that in fact the "Too many open files" error occurs immediately after the lastest success accept. So I think that is because when the maxium fds is reached, each call to accept will return immediately, and the return value is -1. (What I thought is that accept would still block, but returns -1 at the next incoming connect. The behavior of accept in this situation is my own theory, not from the man page. If it's wrong, please let me know).
So to my second question, to make it stop, I think it's a solution that stop to call accept before any connection is closed.
Also update the sample codes. Thanks for your help.
====== Sample codes ======
Here is how I test it. First set ulimit -n to a low value (like 16) and run the server program compiled from the following C source; then use the Python script to create several connections
/* TCP server; bind :5555 */
#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define BUFSIZE 1024
#define PORT 5555
void error(char const* msg)
{
perror(msg);
exit(1);
}
int listen_port(int port)
{
int parentfd; /* parent socket */
struct sockaddr_in serveraddr; /* server's addr */
int optval; /* flag value for setsockopt */
parentfd = socket(AF_INET, SOCK_STREAM, 0);
if (parentfd < 0) {
error("ERROR opening socket");
}
optval = 1;
setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR,
(const void *)&optval , sizeof(int));
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
serveraddr.sin_port = htons((unsigned short)port);
if (bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr)) < 0) {
error("ERROR on binding");
}
if (listen(parentfd, 5) < 0) {
error("ERROR on listen");
}
printf("Listen :%d\n", port);
return parentfd;
}
int main(int argc, char **argv)
{
int parentfd; /* parent socket */
int childfd; /* child socket */
int clientlen; /* byte size of client's address */
struct sockaddr_in clientaddr; /* client addr */
int accept_count; /* times of accept called */
accept_count = 0;
parentfd = listen_port(PORT);
clientlen = sizeof(clientaddr);
while (1) {
childfd = accept(parentfd, (struct sockaddr *) &clientaddr, (socklen_t*) &clientlen);
printf("accept returns ; count=%d ; time=%u ; fd=%d\n", accept_count++, (unsigned) time(NULL), childfd);
if (childfd < 0) {
perror("error on accept");
/* the following 2 lines try to close the listening fd and re-open it */
// close(parentfd);
// parentfd = listen_port(PORT);
// the following line let the program exit at the first error
error("--- error on accept");
}
}
}
The Python program to create connections
import time
import socket
def connect(host, port):
s = socket.socket()
s.connect((host, port))
return s
if __name__ == '__main__':
socks = []
try:
try:
for i in xrange(100):
socks.append(connect('127.0.0.1', 5555))
print ('connect count: ' + str(i))
time.sleep(2)
except IOError as e:
print ('error: ' + str(e))
print ('stop')
while True:
time.sleep(10)
except KeyboardInterrupt:
for s in socks:
s.close()
why accept keeps returning -1 in this situation
Because you've run out of file descriptors, just like the error message says.
what am I supposed to do to stop it and make the server be able to accept new connections after any old clients closed?
Close the clients. The problem is not accept() returning -1, it is that you aren't closing accepted sockets once you're finished with them.
Closing the listening socket isn't a solution. It's just another problem.
EDIT By 'finished with them' I mean one of several things:
They have finished with you, which is shown by recv() returning zero.
You have finished with them, e.g. after sending a final response.
When you've had an error sending or receiving to/from them other than EAGAIN/EWOULDBLOCK.
When you've had some other internal fatal error that prevents you dealing further with that client, for example receiving an unparseable request, or some other fatal application error that invalidates the connection or the session, or the entire client for that matter.
In all these cases you should close the accepted socket.
The answer of EJP is correct, but it does not tell you how to deal with the situation. What you have to do is actually do something with the sockets that you get as accept returns. Simple calling close on them you won't receive anything of course but it would deal with the resource depletion problem. What you have to do to have a correct implementation is start receiving on the accepted sockets and keep receiving until you receive 0 bytes. If you receive 0 bytes, that is an indication that the peer is done using his side of the socket. That is your trigger to call close on the socket as well and deal with the resource problem.
You don't have to stop listening. That would stop your server from being able to process new requests and that is not the problem here.
The solution I implemented here was to review the value of the new (accepted) fd and if that value was equal or higher then the allowed server capacity, then a "busy" message is sent and the new connection is closed.
This solution is quite effective and allows you to inform your clients about the server's status.
Everything compiles without errors and warnings. I start the program. I visit localhost:8080 and the program stops - great. I try to run the program again and I get Error: unable to bind message. Why?
Code:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PORT 8080
#define PROTOCOL 0
#define BACKLOG 10
int main()
{
int fd;
int connfd;
struct sockaddr_in addr; // For bind()
struct sockaddr_in cliaddr; // For accept()
socklen_t cliaddrlen = sizeof(cliaddr);
// Open a socket
fd = socket(AF_INET, SOCK_STREAM, PROTOCOL);
if (fd == -1) {
printf("Error: unable to open a socket\n");
exit(1);
}
// Create an address
//memset(&addr, 0, sizeof addr);
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = AF_INET;
addr.sin_port = htons(PORT);
if ((bind(fd, (struct sockaddr *)&addr, sizeof(addr))) == -1) {
printf("Error: unable to bind\n");
printf("Error code: %d\n", errno);
exit(1);
}
// List for connections
if ((listen(fd, BACKLOG)) == -1) {
printf("Error: unable to listen for connections\n");
printf("Error code: %d\n", errno);
exit(1);
}
// Accept connections
connfd = accept(fd, (struct sockaddr *) &cliaddr, &cliaddrlen);
if (connfd == -1) {
printf("Error: unable to accept connections\n");
printf("Error code: %d\n", errno);
exit(1);
}
//read(connfd, buffer, bufferlen);
//write(connfd, data, datalen);
// close(connfd);
return 0;
}
Use the SO_REUSEADDR socket option before calling bind(), in case you have old connections in TIME_WAIT or CLOSE_WAIT state.
Uses of SO_REUSEADDR?
In order to find out why, you need to print the error; the most likely reason is that another program is already using the port (netstat can tell you).
Your print problem is that C format strings use %, not &. Replace the character in your print string, and it should work.
First, have a look into the following example:
Socket Server Example
Second: The reason why the second bind fails is, because your application crashed, the socket is still bound for a number of seconds or even minutes.
Check with the "netstat" command if the connection is still open.
Try putting the following code just before bind()
int opt = 1;
if (setsockopt(<Master socket FD>, SOL_SOCKET, SO_REUSEADDR, (char *)&opt, sizeof(opt))<0) {perror("setsockopt");exit(EXIT_FAILURE);}if(setsockopt(<Master socket FD>, SOL_SOCKET, SO_REUSEPORT, (char *)&opt, sizeof(opt))<0) {
perror("setsockopt");exit(EXIT_FAILURE);}
Reason behind socket bind error 98:
Socket is 4 tuple (server ip, server port , client ip, client port)
When any two sockets tuples matches , error 98 is thrown
When you terminate the code on server side, it means you are ending connection with tcp client .
Now server is the one which sends FIN to client and goes to TIME_WAIT state.
Typically , in TIME_WAIT sate server sends ack packets continuously to client , assuming that if any ack gets lost in between .
Time out it depends on implementation of code . It could be from 30 seconds to 2 minutes or more.
If you run the code again , server is in TIME_WAIT , hecne port is already in use . This is because any service running on server will use fixed port which is not the case with client .
That is why in real life, server will never send FIN to client .It is client who sends FIN in order to end connection.
Even if client connects again before timeout of TIME_WAIT, he will be connected to server because , he will use now a different port thus socket tuple changes .
If it is implemented in reverse way , if server sends FIN , there after any new connection would not be accept till timeout ends .
Why port is busy ?
It is because in TIME_Wait , the one who sends FIN first, must transmit ack packets continuously till timeout expires.
I have a client connected to the server (TCP connection). In the case when server crashes (I disconnect it) my client needs to be connected to another server, in order to continue service. But when the first server comes back, I need to reconnect client to it again.
I was able to connect my client to the back up server after the first server crashes, but I have a problem with reconnecting my client to the first server. I made a function create_newconnect() for reconnecting to the server, but it doesn't work (that is why I'm not calling it in the code)
I tried to simplify my program as much as I could, so it wouldn't be to big
This is a client side
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <signal.h>
#include <string.h>
#include <arpa/inet.h>
#include <time.h>
#define SIZE sizeof(struct sockaddr_in)
struct sockaddr_in server;
void tcp_protocol();//execute client tcp protocol
void server_check();
void tcp();
void create_newconnect();
int main (int argc, char *argv[])
{
int portno;
//Test for correct number of arguments
if (argc != 3)
{
fprintf(stderr, "Usage: %s Port# IP Address \n", argv[0]);
exit(1);
}
portno = atoi(argv[1]);//convert port # to int
server.sin_family = AF_INET;
server.sin_port = htons(portno);
server.sin_addr.s_addr = inet_addr(argv[2]);//use client ip address
tcp();//call tcp function
return 0;
}
void tcp()
{
int sockfd;
char c ;
//create socket
if ((sockfd = socket(AF_INET, SOCK_STREAM, 0))==-1)
{
perror ("socket call faild");
exit (1);
}
//connect to the server
if (connect (sockfd, (struct sockaddr *)&server, SIZE)==-1)
{
perror ("connect call faild");
exit (1);
}
while(1)
{
printf("Enter char\n");
scanf("%c",&c);
server_check(sockfd);
//send packet to server
if (send(sockfd, &c, sizeof(c),0)<0)
{
printf("error sending\n");
}
//if packet is received from server
if(recv(sockfd, &c, sizeof(c),0)>0)
{
printf("server's respond %c\n", c);//print result
}
}
close(sockfd);
}
void server_check(int sock)
{
char b ='b';
//send packet to server
if (send(sock, &b, sizeof(b),0)<0)
printf("error sending\n");
//if packet is received from server
if((recv(sock, &b, sizeof(b),0)>0))
{
printf("server responded\n");
}
else//if server is not responding
{
printf("server crashed\n");
close(sock);//close socket
server.sin_port = htons(5002);
server.sin_addr.s_addr = inet_addr("127.0.0.1");
tcp();//create new connection
}
}
void create_newconnect()
{
int newsockfd;
server.sin_port = htons(5001);
//create socket
if ((newsockfd = socket(AF_INET, SOCK_STREAM, 0))==-1)
{
perror ("socket call faild");
exit (1);
}
//connect to the server
if (connect (newsockfd, (struct sockaddr *)&server, SIZE)==-1)
{
perror ("connect call faild");
exit (1);
}
tcp();//call function to execute tcp protocol
}
I think the first thing you're going to have to consider is: after your first server has crashed and your client has successfully reconnected to the backup server, how would your client ever know that that the first server has come back on line?
I can think of two possibilities: one might be that the backup server might notify the client about the re-appearance of the primary server (e.g. by sending some sort of PRIMARY_SERVER_ONLINE message over the TCP connection, or perhaps simply by closing the TCP connection, with the expectation that that would cause the client to try to connect to the primary server again).
The other approach would be to make your client smart enough that it can periodically (e.g. once per minute) try to reconnect to the primary server even while it is using the TCP connection to the backup server. That is doable, but not with a single thread and blocking I/O like your posted code has... (because if your program is blocked in a recv() call, there is no way for it to do anything else like try to connect a TCP connection). You'd need to either use non-blocking I/O and select() (or similar), or asynchronous I/O, or multiple threads, in order to do it properly.
Your program recursively calls tcp() after reconnecting. This is almost certainly not correct and will result in resource (mainly stack) use on each disconnection.
You need to avoid having the code pass the socket file descriptor (sockfd) by value to the functions as it will change after each new connection.
As a general principle, you can have a list of (two or more) hosts in order of preference. Then, at all times attempt to create connections to those that have higher preference than the one you currently have a connection to. Then, when a connection is established, close all the other open sessions, and switch to the new preferred connection.
Keep this encapsulated and have it return the current active sockfd for use by all the other functions.
I'm working on an application that opens a data stream TCP socket to another.
After the connection is open and accepted by the server, I send a "login" message which is received by the server successfully, the server attempts to send a "success" message. This is where things get weird.
The write on the server fails and errno is set to "Broken pipe".
The client polls on the file descriptor waiting for data to read which fails as well. On it errno is set to "Connection refused".
All connections are TCP on the loopback device.
Using tcpdump, I can see that FIN is send from the client to the server.
It can be found here.
How can errno be "Connection refused" if the connection was already established?
What might cause this behavior? The client code is synchronous without any threads and no one else has access to the file descriptor.
If it matters, the server is the Asterisk manager.
Snippets of the client code (real code has error checking, separated functions and such):
struct sockaddr_in sa;
int fd;
fd = sock_socket(SOCK_STREAM, 0, 0);.
MZERO(sa);
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(MANAGER_PORT);
connect(fd, (struct sockaddr *)&sa, sizeof(sa));
sprintf(buf,
"Action: Login\r\n"
"Username: %s\r\n"
"Secret: %s\r\n"
"Events: %s\r\n"
"ActionID: %d\r\n"
"\r\n",
MANAGER_USERNAME, MANAGER_PASSWORD, events, manager_action_id++)
write(fd, buf, strlen(buf));
{
struct pollfd fds = {fd, POLLIN, 0};
if (poll(&fds, 1, timeout) <= 0)
return -1; /* This is where the client fails with "Connection refused" */
}
Thanks!
P.S. - Sorry for responding to comments inside the question itself, but I created the question before I created an account and I'm not allowed to add comments.
Regarding to "Broken Pipe",i guess you are trying to write some stuff in a closed socket so a SIG_PIPE signal is released,in this case.Declare <signal.h> and install signal operations in a manner of,like signal(SIG_PIPE,SIG_IGN)