Can socket send fail cause a daemon program crash? - c

I have two applications running on Embedded Linux board. One runs as a daemon and other acts as an interface for it. They communicate with each other using Unix sockets.
As to handle any abnormal termination of socket, I tried terminating the interface application [ctr+c]. But as a result, the daemon application crashes. Since the socket is terminated, I get the socket send failed error on daemon side, which is expected but after that the daemon crashes.
I am at a loss as to where exactly should I look for debugging this problem.

Have you set the socket in your daemon to non-blocking mode ?
Suppose your code looks like the following:
while(1)
{
connfd = accept(listenfd, (struct sockaddr*)NULL, NULL);
/* then you use the fd */
func(connfd);
}
Based on the man page:
"
On success, accept() return a nonnegative integer that is a descriptor for the accepted socket. On error, -1 is returned, and errno is set appropriately.
and
If no pending connections are present on the queue, and the socket is not marked as nonblocking, accept() blocks the caller until a connection is present. If the socket is marked nonblocking and no pending connections are present on the queue, accept() fails with the error EAGAIN or EWOULDBLOCK.
"
Therefore, it means if you are in non-blocking mode, you should check the return value of accept() instead of using it directly because the fd value would be -1.
The above is just one common possibility. If it is not the case, you can try to use "sudo strace -p process_id" or carry out the core file analysis to understand why it is crashed.

Related

Socket connection ends with Operation now in progress on non-blocking socket

I have a problem with connecting to a destination IP using connect() API. The connect() API returns a -1 and errno as operation in progress
. Am I checking the return code too early before it establishes a connection? Please see the following code snippet:
struct sockaddr_in servAddr;
servAddr.sin_family = AF_INET;
servAddr.sin_port = htons(9190);
const char * remoteIp = 10.10.20.86;
rc = inet_pton(AF_INET,remoteIp, &servAddr.sin_addr);
if (rc == -1 || errno == EAFNOSUPPORT)
{
return 0;
}
rc = connect(fd, (sockaddr*)&servAddr, sizeof(servAddr));
if ( rc < 0) // this is where it fails. rc is -1.
{
log("connect failure with [%s]",strerror(errno));
print_sock_connect_error();
}
I have 2 questions here:
The destination IP and port 10.10.20.86:9190 is waiting for a connection and once the connection is received, it send the ack back to the source. I see the tcp established - ACK,SYN/ACK and ACK to destination - in pcap but still couldn't figure out why it returns -1 with error. So Am I checking the rc before the connection establishment is complete? sysctl net.ipv4.tcp_syn_retries is set to 6.
Is there anything wrong with the code above?
Am I checking the rc before the connection establishment is complete?
Yes, you are. The TCP ping-pong during the connection's set up isn't all that has to be done.
Is there anything wrong with the code above?
Well, yes, either the way it handles the EINPROGRESS case or that is uses a non-blocking socket to connect.
From connect()'s Linux documentation:
EINPROGRESS
The socket is nonblocking and the connection cannot be
completed immediately. It is possible to select(2) or poll(2)
for completion by selecting the socket for writing. After
select(2) indicates writability, use getsockopt(2) to read the
SO_ERROR option at level SOL_SOCKET to determine whether
connect() completed successfully (SO_ERROR is zero) or
unsuccessfully (SO_ERROR is one of the usual error codes
listed here, explaining the reason for the failure).
10.10.20.86:9190 is waiting for a connection and once the connection is received, it send the ack back to the source. I see the tcp established - ACK,SYN/ACK and ACK to destination - in pcap but still couldn't figure out why it returns -1 with error. So Am I checking the rc before the connection establishment is complete?
Of course you are. You're checking it immediately connect() returns. As you have put the socket into non-blocking mode, there is no chance the three-way wire handshake will have completed by then.
sysctl net.ipv4.tcp_syn_retries is set to 6.
Irrelevant.
Is there anything wrong with the code above?
Only that it doesn't make sense.
If you want he connection complete or failed before connect() returns, don't use non-blocking mode.
If you want to use non-blocking mode, you have to use select() to tell you when the connect attempt has completed. Select for the socket becoming writeable. (That doesn't necessarily mean it has become writeable: it means the connect attempt has completed, with a result you can discover via getsockopt()/SO_ERROR.)

How to ensure a posix c socket is still valid

I,m working on an embedded linux kernel 2.6 device and need to know if previously established socket is still valid or not,Also I can not do this with usual send function and check the returned value,because if I send to the invalid socket descriptor,my application will crash and linux will shut down my process.Is there any other function/suggestion for this ?
EDIT:
There are an installed app manager in device and when I try to send to socket descriptor which is not refer to the open socket,app manager will end my application,then if i close a socket connection and try to write to it,my application will be turned off by lower level app-manager.Also I'm using TCP sockets,WBr.
I think this question is either misstated or based on false premises. There is no sense of "invalidity" which a socket could come to have asynchronously by the action of another process/host. The closest thing is probably the other end of the socket being closed, which does not invalidate your socket, but it does cause subsequent writes to your socket to result in an EPIPE error and SIGPIPE signal if not blocked. SIGPIPE in turn terminates your process by default. If that's your problem, the easiest way to avoid it is to block SIGPIPE with sigprocmask/pthread_sigmask, or ignore it with signal(SIGPIPE, SIG_IGN).

Getting SIGPIPE with non-blocking sockets -- is this normal?

I'm writing an epoll-based network server in C. When I create my socket to listen for incoming connections, I make it non-blocking using fcntl. Similarly when incoming connections arrive from clients, I make their sockets non-blocking before doing anything with them, and likewise for outgoing connections' sockets.
Sometimes my server gets a SIGPIPE -- I think this is when I try to write to a client connection that has been closed by the client. This seems strange to me; I thought that with non-blocking sockets instead of a SIGPIPE I should get an -1 back from the call to write and ECONNRESET in errno.
Is there something I'm missing? Or is it just normal to get both a SIGPIPE and an error code even with non-blocking sockets (meaning that I should explicitly ignore the signal with signal(SIGPIPE, SIG_IGN) in my setup)?
Yes, this is normal. If you write to a socket (non-blocking or not) where the other end has closed the connection, you will get a SIGPIPE or (if you are blocking the SIGPIPE signal) an error return (-1) with errno set to EPIPE.
From the man page for write:
EPIPE: fd is connected to a pipe or socket whose reading end is closed. When this happens the writing process will also receive
a SIGPIPE signal. (Thus, the write return value is seen only if the program catches, blocks or ignores this signal.)
The POSIX standard is here: http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html and says:
[EPIPE] An attempt is made to write to a pipe or FIFO that is not open for reading by any process, or that only has one end open. A SIGPIPE signal shall also be sent to the thread.
The SIGPIPE is normal. Another option beside setting signal handler solely for this purpose is to use flag MSG_NOSIGNAL whenever you send.

Behaviour of select() on remote socket closed(by killing process)

Two processes are communicating via sockets - Process A and Process B.
Process B is using select() call to check when the socket is the ready for I/O.
Process A is suddenly killed. What will happen to the B side socket. Will B side socket automatically detect that A's socket is no longer available and select() will return -1 with EABDF. OR select() call will remain blocked forever.
Select will unlock and either an error case or a read case will be returned.
select() returns and says that the socket is readable. When you read the socket, you will get -1 (and the corresponding error in errno) or 0 (EOF).
The tcp socket will remain half opened for some time if there's no heartbeat between two sides.
Finally tcp connection will time out, depends on the time out settings.
Refer to: http://en.wikipedia.org/wiki/Half-open_connection

select says socket is ready to read when it is definitely not (actually is closed already)

In my server I check if any socket is ready to read using select() to determine it. As a result in main loop select() is executed every time it iterates.
To test the server I wrote a simple client that sends only one message and then quits. BTW. I use protocol buffers to send information - message means an object of type class Message in this library.
The test session looks like:
select()
server's socket ready to read
accept() client's socket
read message from client's socket
select()
server's socket not ready to read, client's one ready
read message from client's socket
The last step is wrong because client has already closed connection. As a result protobuf library gets Segmentation fault. I wonder why FD_ISSET says the socket is ready in step 6 when it is closed. How can I check if a socket is closed?
EDIT:
I've found how to check if the socket is open
int error = 0;
socklen_t len = sizeof (error);
int retval = getsockopt (socket_fd, SOL_SOCKET, SO_ERROR, &error, &len );
the socket is "readable" if the remote peer closes it, you need to call recv and handle both the case where it returns an error, and the case where it returns 0, which indicates that the peer shut down the connection in an orderly fashion.
Reading the SO_ERROR sockopt is not the correct way, as it returns the current pending error (from, eg. a non-blocking connect)
The socket used for communication between a client and your server will be flagged as readable (i.e. select() will return) when there is data to read, or when there's an EOF to read (i.e. the peer closed the connection).
Just read() when select() returns and your fd is flagged. If read() returns a positive number, you got data. If it returns 0, you got EOF. If it returns -1, you have a problem (unless errno is EAGAIN).

Resources