Linux, sockets, non-blocking connect

Linux, sockets, non-blocking connect - c

I want to create a non-blocking connect.
Like this:
socket.connect(); // returns immediately
For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):
// in another thread
{
create_non_block_socket();
connect();
epoll_create();
epoll_ctl(); // subscribe socket to all events
while (true)
{
epoll_wait(); // wait a small time(~100 ms)
check_socket(); // check on EPOLLOUT event
}
}
If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.
What am I doing wrong? Maybe it can be done differently?

You should use the following steps for an async connect:
create socket with socket(..., SOCK_NONBLOCK, ...)
start connection with connect(fd, ...)
if return value is neither 0 nor EINPROGRESS, then abort with error
wait until fd is signalled as ready for output
check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
done
No loops - unless you want to handle EINTR.
If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.
It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.

There are a few ways to test if a nonblocking connect succeeds.
call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.
Ref: UNIX Network Programming V1

D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.
For those who just want the tl;dr version, the most portable way is the following:
Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.

Related

Hang at connect()

So, I have a client that attempts to connect with a server. The ip and port are retrieved from a configuration file. I need the program to fail smoothly if something in the config file is incorrect. I connect to the server using the following code
if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
{
perror("client: connect");
close(sockfd);
continue;
}
If the user attempts to connect to a server on the subnet that is not accepting connections (i.e. is not present), then the program fails with No route to host. If the program attempts to connect to a server that is not on the subnet (i.e. the configuration is bad), then the program hangs at the connect() call. What am I doing incorrectly? I need this to provide some feedback to the user that the application has failed.

You're not doing anything wrong. TCP is designed for reliability in the face of network problems, so if it doesn't get a response to its initial connection request, it retries several times in case the request or response were lost in the network. The default parameters on Linux result in it taking about a minute to give up. Then it will report a failure with the Connection timed out error.
If you want to detect the failure more quickly, see C: socket connection timeout

Normally we don't use continue inside an if statement, unless the if statement is inside a loop, that you are not showing. Assuming there is an outer loop, this would be responsible for what happens next .. either keeps re-entering the if block ( to try to connect again) or skipping past it.
Note also you are closing sockfd inside the if block so if your loop is re-entering the if block to do retries, then it needs to create a new socket first.
I suggest reading some sample code for client and server side socket connections to get a better feel for how it works http://www.cs.rpi.edu/~moorthy/Courses/os98/Pgms/socket.html
If all fails, please provide the code around the if block and also state how you want to "fail smoothly". One way to fail "abruptly' would be to swap the continue statement with a call to exit() "-)
EDIT: After reading Barmar's answer and his comment you also need to be aware of this:
If the initiating socket is connection-mode, then connect() shall
attempt to establish a connection to the address specified by the
address argument. If the connection cannot be established immediately
and O_NONBLOCK is not set for the file descriptor for the socket,
connect() shall block for up to an unspecified timeout interval until
the connection is established. If the timeout interval expires before
the connection is established, connect() shall fail and the connection
attempt shall be aborted.
also..
If the connection cannot be established immediately and O_NONBLOCK is
set for the file descriptor for the socket, connect() shall fail and
set errno to [EINPROGRESS], but the connection request shall not be
aborted, and the connection shall be established asynchronously.
Subsequent calls to connect() for the same socket, before the
connection is established, shall fail and set errno to [EALREADY]
When you say "the program hangs" did you mean forever or for a period that might be explained by a TCP/IP timeout.
If this and Barmar's answer are still not enough, then it would help to see the surrounding code as suggested and determine if blocked or non-blocked etc.

Does socket error mean socket is closed

I am writing a client-server program. The server is select()ing on readfd1 waiting for the readiness of readfd1 to be read. If it is ready, server is collecting the data and printing. Everything is fine for a while, but after some time, socket recv() failed with errno set to ETIMEDOUT. Now I want to rewrite my program to thwart these error condtions. So I went through "Unix Network Programming" by Richard Stevens, which states 4 conditions for select() to unblock. Following are the 2 conditions that grab my attention
A. client sent FIN, here return value of `recv()` will be `0`
B. some socket error, here return value of `recv()` will be `-1`.
My question is, Does socket error closes the connection? If that is so, then why is above two conditions are separated. If not, does next recv() on socket work?

If recv() returns 0, the other end has actively and gracefully closed the connection.
If recv() returns -1, there has (possibly) been an error on the connection, and it is no longer usable.
This means you can tell the difference between the peer closing the connection, and an error happening on the connection. The common thing to do in both cases is to close() your end of the socket.
There is 2 more points to consider though:
In the case of recv() returning -1, you should inspect errno, as it might not indicate a real error.
errno can be EAGAIN/EWOULDBLOCK if you have placed the socket in non-blocking mode or it could be EINTR if the system call was interrupted by a signal. All other errno values means the connection is broken, and you should close it.
TCP can operate in half duplex. If the peer has closed only its writing end of the connection, recv() returns 0 at your end. The common thing to do is to consider the connection as finished, and close your end of the connection too, but you can continue to write to it and the other end can continue to read from it. Whether to close just the reading or writing end of a TCP connection is controlled by the shutdown() function.

A socket error doesn't have to mean that the connection is closed, consider for example what happens if somehow the network cable between you and your peer is cut then you would typically get that ETIMEDOUT error. Most errors are unrecoverable, so closing your end of the connection on error is almost always advisable.
The difference between the two states when select can unblock, is because either the other end closed their connection in a nice way (the first case), or that there are some actual error (the second case).

using a non-blocking socket connection in C

I'm changing a socket connection in a script to a non-blocking connection. In a tutorial I found the lines:
x=fcntl(s,F_GETFL,0); // Get socket flags
fcntl(s,F_SETFL,x | O_NONBLOCK); // Add non-blocking flag
So I added them after I create my socket and before the connect statement. And it's no longer blocking :) but it also doesn't connect. I'm not getting any errors, the connect is just returning -1. If I comment these lines out it connects.
What else do I need to add to get a non-blocking connection to connect?

Check return value of connect(2) - you should be getting -1, and EINPROGRESS in errno(3). Then add socket file descriptor to a poll set, and wait on it with select(2) or poll(2).
This way you can have multiple connection attempts going on at the same time (that's how e.g. browsers do it) and be able to have tighter timeouts.

connect will probably immediately return a EINPROGRESS error. Read up on use of select.
Note that you'll probably want to wrap your call to select in the TEMP_FAILURE_RETRY macro.

How can I cause an ECONNRESET in recv() from a client?

I am using a server that is crashing following a call to recv() returning -1 and errno set to ECONNRESET. I originally found this condition using nmap (I'm not a cracker, was just testing if the port was open at the time.) However, nmap uses raw sockets so I'm not too happy submitting this as a test case to the developers. I would rather write a client program in C that can cause the ECONNRESET.
So far I have tried two things: connect() to the server from my client and then shutdown() the socket immediately after connecting. recv() on the server still returned 1 (I have inserted debugging code so I can see the return value.) I also tried calling send() with some string and then immediately calling shutdown(). No dice, the string was transmitted fine.
So how would I cause this condition? Non portable is fine, I am using Linux.

The problem is that you are calling shutdown. Call close instead.
Take a look at a TCP state diagram.
http://tangentsoft.net/wskfaq/articles/debugging-tcp.html
Basically, shutdown closes a socket "politely" by sending a FIN and waiting for the peer to finish (FIN -> ACK/FIN -> ACK -> closed), at which point you call close and all is good. If you call close without calling shutdown first, it's the "impolite" version which sends a RST -- the equivalent of hanging up in the middle of a phone call, without waiting for the other person to finish what they're saying.
Think of "shutdown" as "say goodbye", and "close" as "hang up". You always have to hang up, but you don't have to say goodbye first.
About nmap: It is perfectly acceptable to give developers a test case with nmap. That's one of the main purposes of nmap anyway.

Your instincts were correct to use shutdown(), however you were not using it correctly for this.
Presumably you are trying shutdown() with SHUT_WR or SHUT_RDWR. When you close the writing direction, as these do, your side of the connection notifies the peer with a FIN - indicating that no more data will be forthcoming from your side. This will cause recv() on the other side to indicate a clean end-of-file on the connection, which isn't what you want in this case.
Instead, you want to use SHUT_RD to shutdown the reading direction of the socket only, and hold it open for writing. This will not notify the peer immediately - but if the peer sends any data after this point, your side will respond with a RST, to inform the peer that some data was lost - it wasn't seen by your client application.
(So, to ensure that you get a connection reset, you need to make sure that the server will be trying to send something to you - you might need to send something first, then perform the reading shutdown).

Reusing socket descriptor on connection failure

In my client code, I am following these steps to connect to a socket:
Creating a socket
sockDesc = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)
Connecting it (retry for 'x' time in case of failure)
connect(sockDesc, (sockaddr *) &destAddr, sizeof(destAddr))
(After filling the destAddr fields)
Using the socket for send()/recv() operation:
send(sockDesc, buffer, bufferLen, 0)
recv(sockDesc, buffer, bufferLen, 0)
close() the socket descriptor and exit
close(sockDesc)
If during send()/recv() the connection breaks, I found that I could connect by returning to step 2.
Is this solution okay? should I close the socket descriptor and return to step 1?
Another interesting observation that I am not able to understand is when
I stop my echo server and start the client. I create a Socket (step 1) and call connect() which fails (as expected) but then I keep calling connect(), lets say, 10 times. After 5 retries I start the server and connect() is successful. But during the send() call it receives SIGPIPE error. I would like to know:
1) Do I need to create a new socket every time connect() fails? As per my understanding as long as I have not performed any send()/recv() on the socket it is as good as new and I can reuse the same fd for the connect() call.
2) I don't understand why SIGPIPE is received when the server is up and connect() is successful.

Yes, you should close and go back to step 1:
close() closes a file descriptor,
so that it no longer refers to any
file and may be reused.
From here.

I think closing the socket is the right thing to do, despite the fact that it may work if you don't.
A socket which has failed to connect may not be in EXACTLY the same state as a brand new one - which could cause problems later. I'd rather avoid the possibility and just make a new one. It's cleaner.
TCP sockets hold a LOT of state, some of which is implementation-specific and worked out from the network.

Sockets corresponding to broken connection is in unstable state. normally you will not be allowed to connect to again unless the operating system release the socket.
I think it will be better to close() and connect again.. you don't have to create another socket.
Anyway, make sure to set LINGER of your socket to ensure that no data is lost in transmision.
See http://www.gnu.org/s/libc/manual/html_node/Socket_002dLevel-Options.html#Socket_002dLevel-Options

If the connection was broken and you try to write on the file descriptor you should get the broken pipe error/signal. All this is saying is that the file descriptor you tried writing to no longer has anyone on the other side to read what you are sending.
What you can do is catch the signal SIGPIPE and then deal with the reconnecting by closing the FD and going back to your step 1. You will now have a new FD you can read and write from for the connection.

If the Single UNIX Specification doesn't say that it MUST work to go back to step #2 instead of step #1, then the fact that it happens to work on Linux is just an implementation detail, and you would be far better off and more portable if you go back to step #1. As far as I am aware, the specification does not make any guarantee that it is ok to go back to step #2 and, therefore, I would advise you to go back to step #1.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight