How to debug tcp socket connection in C?

How to debug tcp socket connection in C? - c

I have a program in C
I want to connect to the socket with the address 0xAC101067 port 3333 (172.16.16.103:3333)
but it always connected failed and always get -1 of the result
connect(device_info->cloud_fd, &addr, sizeof(addr))
what I known from the API it said 0 is success and -1 is fail,
So how to find out the problem in this program?
if (device_info -> cloud_fd == -1 && (u32) cloud_ip_addr > 0) {
device_info -> cloud_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
setsockopt(device_info -> cloud_fd, 0, SO_BLOCKMODE, & opt, 4);
cloud_ip_addr = 0xAC101067;
addr.s_ip = cloud_ip_addr;
addr.s_port = 3333;//device_info->conf.server_port;
printf("device_info->cloud_fd=%d\r\n", device_info -> cloud_fd);
if (connect(device_info -> cloud_fd, & addr,sizeof(addr))!=0)
goto cloud_error;
}

Fetch appropriate errno as elucidated in the answer from
#LPs. The problem may not be in the code but external.
Get the tcpdump between the client and the server. See what
transpires in the wire. This capture is indispensable in debugging why clients fail to connect. May be the server is not reachable and connect times out(ETIMEDOUT) Or there is no one listening on the said port destination machine(ECONNREFUSED).

ERRNO DESCRIPTION
**errno - number of last error**
SYNOPSIS
#include <errno.h>
DESCRIPTION
The <errno.h> header file defines the integer variable errno, which
is set by system calls and some library functions in the event of an
error to indicate what went wrong. Its value is significant only
when the return value of the call indicated an error (i.e., -1 from
most system calls; -1 or NULL from most library functions); a
function that succeeds is allowed to change errno.
Valid error numbers are all nonzero; errno is never set to zero by
any system call or library function.
FOR YOUR FUNCTION
RETURN VALUE
If the connection or binding succeeds, zero is returned. On error,
-1 is returned, and errno is set appropriately.
ERRORS
The following are general socket errors only. There may be other
domain-specific error codes.
EACCES For UNIX domain sockets, which are identified by pathname:
Write permission is denied on the socket file, or search
permission is denied for one of the directories in the path
prefix. (See also path_resolution(7).)
EACCES, EPERM
The user tried to connect to a broadcast address without
having the socket broadcast flag enabled or the connection
request failed because of a local firewall rule.
EADDRINUSE
Local address is already in use.
EADDRNOTAVAIL
(Internet domain sockets) The socket referred to by sockfd had
not previously been bound to an address and, upon attempting
to bind it to an ephemeral port, it was determined that all
port numbers in the ephemeral port range are currently in use.
See the discussion of /proc/sys/net/ipv4/ip_local_port_range
in ip(7).
EAFNOSUPPORT
The passed address didn't have the correct address family in
its sa_family field.
EAGAIN Insufficient entries in the routing cache.
EALREADY
The socket is nonblocking and a previous connection attempt
has not yet been completed.
EBADF The file descriptor is not a valid index in the descriptor
table.
ECONNREFUSED
No-one listening on the remote address.
EFAULT The socket structure address is outside the user's address
space.
EINPROGRESS
The socket is nonblocking and the connection cannot be
completed immediately. It is possible to select(2) or poll(2)
for completion by selecting the socket for writing. After
select(2) indicates writability, use getsockopt(2) to read the
SO_ERROR option at level SOL_SOCKET to determine whether
connect() completed successfully (SO_ERROR is zero) or
unsuccessfully (SO_ERROR is one of the usual error codes
listed here, explaining the reason for the failure).
EINTR The system call was interrupted by a signal that was caught;
see signal(7).
EISCONN
The socket is already connected.
ENETUNREACH
Network is unreachable.
ENOTSOCK
The file descriptor is not associated with a socket.
EPROTOTYPE
The socket type does not support the requested communications
protocol. This error can occur, for example, on an attempt to
connect a UNIX domain datagram socket to a stream socket.
ETIMEDOUT
Timeout while attempting connection. The server may be too
busy to accept new connections. Note that for IP sockets the
timeout may be very long when syncookies are enabled on the
server.

Related

Hang at connect()

So, I have a client that attempts to connect with a server. The ip and port are retrieved from a configuration file. I need the program to fail smoothly if something in the config file is incorrect. I connect to the server using the following code
if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
perror("client: connect");
close(sockfd);
continue;
}
If the user attempts to connect to a server on the subnet that is not accepting connections (i.e. is not present), then the program fails with No route to host. If the program attempts to connect to a server that is not on the subnet (i.e. the configuration is bad), then the program hangs at the connect() call. What am I doing incorrectly? I need this to provide some feedback to the user that the application has failed.

You're not doing anything wrong. TCP is designed for reliability in the face of network problems, so if it doesn't get a response to its initial connection request, it retries several times in case the request or response were lost in the network. The default parameters on Linux result in it taking about a minute to give up. Then it will report a failure with the Connection timed out error.
If you want to detect the failure more quickly, see C: socket connection timeout

Normally we don't use continue inside an if statement, unless the if statement is inside a loop, that you are not showing. Assuming there is an outer loop, this would be responsible for what happens next .. either keeps re-entering the if block ( to try to connect again) or skipping past it.
Note also you are closing sockfd inside the if block so if your loop is re-entering the if block to do retries, then it needs to create a new socket first.
I suggest reading some sample code for client and server side socket connections to get a better feel for how it works http://www.cs.rpi.edu/~moorthy/Courses/os98/Pgms/socket.html
If all fails, please provide the code around the if block and also state how you want to "fail smoothly". One way to fail "abruptly' would be to swap the continue statement with a call to exit() "-)
EDIT: After reading Barmar's answer and his comment you also need to be aware of this:
If the initiating socket is connection-mode, then connect() shall
attempt to establish a connection to the address specified by the
address argument. If the connection cannot be established immediately
and O_NONBLOCK is not set for the file descriptor for the socket,
connect() shall block for up to an unspecified timeout interval until
the connection is established. If the timeout interval expires before
the connection is established, connect() shall fail and the connection
attempt shall be aborted.
also..
If the connection cannot be established immediately and O_NONBLOCK is
set for the file descriptor for the socket, connect() shall fail and
set errno to [EINPROGRESS], but the connection request shall not be
aborted, and the connection shall be established asynchronously.
Subsequent calls to connect() for the same socket, before the
connection is established, shall fail and set errno to [EALREADY]
When you say "the program hangs" did you mean forever or for a period that might be explained by a TCP/IP timeout.
If this and Barmar's answer are still not enough, then it would help to see the surrounding code as suggested and determine if blocked or non-blocked etc.

Socket connection ends with Operation now in progress on non-blocking socket

I have a problem with connecting to a destination IP using connect() API. The connect() API returns a -1 and errno as operation in progress
. Am I checking the return code too early before it establishes a connection? Please see the following code snippet:
struct sockaddr_in servAddr;
servAddr.sin_family = AF_INET;
servAddr.sin_port = htons(9190);
const char * remoteIp = 10.10.20.86;
rc = inet_pton(AF_INET,remoteIp, &servAddr.sin_addr);
if (rc == -1 || errno == EAFNOSUPPORT)
{
return 0;
}
rc = connect(fd, (sockaddr*)&servAddr, sizeof(servAddr));
if ( rc < 0) // this is where it fails. rc is -1.
{
log("connect failure with [%s]",strerror(errno));
print_sock_connect_error();
}
I have 2 questions here:
The destination IP and port 10.10.20.86:9190 is waiting for a connection and once the connection is received, it send the ack back to the source. I see the tcp established - ACK,SYN/ACK and ACK to destination - in pcap but still couldn't figure out why it returns -1 with error. So Am I checking the rc before the connection establishment is complete? sysctl net.ipv4.tcp_syn_retries is set to 6.
Is there anything wrong with the code above?

Am I checking the rc before the connection establishment is complete?
Yes, you are. The TCP ping-pong during the connection's set up isn't all that has to be done.
Is there anything wrong with the code above?
Well, yes, either the way it handles the EINPROGRESS case or that is uses a non-blocking socket to connect.
From connect()'s Linux documentation:
EINPROGRESS
The socket is nonblocking and the connection cannot be
completed immediately. It is possible to select(2) or poll(2)
for completion by selecting the socket for writing. After
select(2) indicates writability, use getsockopt(2) to read the
SO_ERROR option at level SOL_SOCKET to determine whether
connect() completed successfully (SO_ERROR is zero) or
unsuccessfully (SO_ERROR is one of the usual error codes
listed here, explaining the reason for the failure).

10.10.20.86:9190 is waiting for a connection and once the connection is received, it send the ack back to the source. I see the tcp established - ACK,SYN/ACK and ACK to destination - in pcap but still couldn't figure out why it returns -1 with error. So Am I checking the rc before the connection establishment is complete?
Of course you are. You're checking it immediately connect() returns. As you have put the socket into non-blocking mode, there is no chance the three-way wire handshake will have completed by then.
sysctl net.ipv4.tcp_syn_retries is set to 6.
Irrelevant.
Is there anything wrong with the code above?
Only that it doesn't make sense.
If you want he connection complete or failed before connect() returns, don't use non-blocking mode.
If you want to use non-blocking mode, you have to use select() to tell you when the connect attempt has completed. Select for the socket becoming writeable. (That doesn't necessarily mean it has become writeable: it means the connect attempt has completed, with a result you can discover via getsockopt()/SO_ERROR.)

Can socket send fail cause a daemon program crash?

I have two applications running on Embedded Linux board. One runs as a daemon and other acts as an interface for it. They communicate with each other using Unix sockets.
As to handle any abnormal termination of socket, I tried terminating the interface application [ctr+c]. But as a result, the daemon application crashes. Since the socket is terminated, I get the socket send failed error on daemon side, which is expected but after that the daemon crashes.
I am at a loss as to where exactly should I look for debugging this problem.

Have you set the socket in your daemon to non-blocking mode ?
Suppose your code looks like the following:
while(1)
{
connfd = accept(listenfd, (struct sockaddr*)NULL, NULL);
/* then you use the fd */
func(connfd);
}
Based on the man page:
"
On success, accept() return a nonnegative integer that is a descriptor for the accepted socket. On error, -1 is returned, and errno is set appropriately.
and
If no pending connections are present on the queue, and the socket is not marked as nonblocking, accept() blocks the caller until a connection is present. If the socket is marked nonblocking and no pending connections are present on the queue, accept() fails with the error EAGAIN or EWOULDBLOCK.
"
Therefore, it means if you are in non-blocking mode, you should check the return value of accept() instead of using it directly because the fd value would be -1.
The above is just one common possibility. If it is not the case, you can try to use "sudo strace -p process_id" or carry out the core file analysis to understand why it is crashed.

Linux, sockets, non-blocking connect

I want to create a non-blocking connect.
Like this:
socket.connect(); // returns immediately
For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):
// in another thread
{
create_non_block_socket();
connect();
epoll_create();
epoll_ctl(); // subscribe socket to all events
while (true)
{
epoll_wait(); // wait a small time(~100 ms)
check_socket(); // check on EPOLLOUT event
}
}
If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.
What am I doing wrong? Maybe it can be done differently?

You should use the following steps for an async connect:
create socket with socket(..., SOCK_NONBLOCK, ...)
start connection with connect(fd, ...)
if return value is neither 0 nor EINPROGRESS, then abort with error
wait until fd is signalled as ready for output
check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
done
No loops - unless you want to handle EINTR.
If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.
It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.

There are a few ways to test if a nonblocking connect succeeds.
call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.
Ref: UNIX Network Programming V1

D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.
For those who just want the tl;dr version, the most portable way is the following:
Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.

select says socket is ready to read when it is definitely not (actually is closed already)

In my server I check if any socket is ready to read using select() to determine it. As a result in main loop select() is executed every time it iterates.
To test the server I wrote a simple client that sends only one message and then quits. BTW. I use protocol buffers to send information - message means an object of type class Message in this library.
The test session looks like:
select()
server's socket ready to read
accept() client's socket
read message from client's socket
select()
server's socket not ready to read, client's one ready
read message from client's socket
The last step is wrong because client has already closed connection. As a result protobuf library gets Segmentation fault. I wonder why FD_ISSET says the socket is ready in step 6 when it is closed. How can I check if a socket is closed?
EDIT:
I've found how to check if the socket is open
int error = 0;
socklen_t len = sizeof (error);
int retval = getsockopt (socket_fd, SOL_SOCKET, SO_ERROR, &error, &len );

the socket is "readable" if the remote peer closes it, you need to call recv and handle both the case where it returns an error, and the case where it returns 0, which indicates that the peer shut down the connection in an orderly fashion.
Reading the SO_ERROR sockopt is not the correct way, as it returns the current pending error (from, eg. a non-blocking connect)

The socket used for communication between a client and your server will be flagged as readable (i.e. select() will return) when there is data to read, or when there's an EOF to read (i.e. the peer closed the connection).
Just read() when select() returns and your fd is flagged. If read() returns a positive number, you got data. If it returns 0, you got EOF. If it returns -1, you have a problem (unless errno is EAGAIN).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight