crash network and consequent state of socket - c

I would like to know how do the state of a socket become when the network on which it work crashes. My problem is when I simulate the collapse of this network the select() function, that controls all socket, returns me some socket that theoretically should not be set. It's possible that the operating system set a crashed socket both in writing and in reading?

The first thing to keep in mind is that your computer typically will not know when the "network crashes" per se. All the computer will know is whether or not it is receiving packets from the network, or not. (Some computers might also know if the electrical signal on their local Ethernet port has gone away, but since it is possible for more distant parts of the network to go down without affecting the signal on the local Ethernet cable, that information is only occasionally useful).
In practice, if the network between your computer and (the computer it was talking to) stops working, you'll see the following effects:
(1) Any UDP packets you send will be dropped without a trace, and usually without any error indication. And of course you won't receive any UDP packets from the remote peer either.
(2) Data traffic on any TCP connection between your computer and the remote peer will grind quickly to a halt. After a certain timeout period (usually several minutes) has elapsed without the OS receiving any responses from the remote peer, the operating system will "give up" and mark the TCP connection as closed; at which point you will see behavior identical to what you would get if the remote peer had deliberately closed the connection: that is, select() will return ready-for-read (and possibly ready-for-write also, I forget), and then when you try to actually do a recv() or read() on the socket, you will get an EOF (i.e. recv() on a blocking socket will return 0; recv() on a non-blocking socket will return -1). (if the network recovers before the timeout completes, then TCP traffic on your socket will resume, although it will start resuming slowly and gradually speed up again over time)

Your description is unclear, but it is possible that the select() is signalling an EOS on the socket concerned, which wouldn't represent a network 'crash' but an orderly close by the peer, possibly unexpected by you.

Related

detecting connection state in epoll linux

There are many threads regarding how to detect if a socket is connected or not using various methods like getpeername / getsockopt w/ SO_ERROR. https://man7.org/linux/man-pages/man2/getpeername.2.html would be a good way for me to detect if a socket is connected or not. The problem is, it does not say anything about if the connection is in progress... So if i call connect, it is in progress, then i call getpeername, will it say it is an error (-1) even though the connection is still in progress?
If it does, I can implement a counter-like system that will eventually kill the socket if it is still in progress after x seconds.
Short Answer
I think that, if getpeername() returns ENOTCONN, that simply means that the tcp connection request has not yet succeeded. For it to not return ENOTCONN, I think the client end needs to have received the syn+ack from the server and sent its own ack, and the server end needs to have received the client's ack.
Thereafter all bets are off. The connection might subsequently be interrupted, but getpeername() has no way of knowing this has happened.
Long Answer
A lot of it depends on how fussy and short-term one wants to be about knowing if the connection is up.
Strictly Speaking...
Strictly speaking with maximum fussiness, one cannot know. In a packet switched network there is nothing in the network that knows (at any single point in time) for sure that there is a possible connection between peers. It's a "try it and see" thing.
This contrasts to a circuit switched network (e.g. a plain old telephone call), where there is a live circuit for exclusive use between peers (telephones); provided current is flowing, you know the circuit is complete even if the person at the other end of the phone call is silent.
Note that if the two computers were connected by a single Ethernet cable (no router, no switches, just a cable between NICs), that is effectively a fixed circuit (not even a circuit-switched network).
Relaxing a Little...
Focusing on what one can know about a connection in a packet switched network. As others have already said, the answer is that, really, one has to send and receive packets constantly to know if the network can still connect the two peers.
Such an exchange of packets occurs with a tcp socket connect() - the connecting peer sends a special packet to say "please can I connect to you", and the serving peer replies "yes", the client then says "thank you!" (syn->, <-syn+ack, ack->). But thereafter the packets flow between peers only if the applications send and receive data, or elects to close the connection (fin).
Calling something like getpeername() I think is somewhat misleading, depending on your requirements. It's fine, if you trust the network infrastructure and remote computer and its application to not break, and not crash.
It's possible for the connect() to succeed, then something breaks somewhere in the network (e.g. the peer's network connection is unplugged, or the peer crashes), and there is no knowledge at your end of the network that that has happened.
The first thing you can know about it is if you send some traffic and fail to get a response. The response is, initially, the tcp acks (which allows your network stack to clear out some of its buffers), and then possibly an actual message back from the peer application. If you keep sending data out into the void, the network will quite happily route packets as far as it can, but your tcp stack's buffers will fill up due to the lack of acks coming back from the peer. Eventually, your network socket blocks on a call to write(), because the local buffers are full.
Various Options...
If you're writing both applications (server and client), you can write the application to "ping pong" the connection periodically; just send a message that means nothing other than "tell me you heard this". Successful ping-ponging means that, at least within the last few seconds, the connection was OK.
Use a library like ZeroMQ. This library solves many issues with using network connections, and also includes (in modern version) socket heartbeats (i.e. a ping pong). It's neat, because ZeroMQ looks after the messy business of making, restoring and monitoring connections with a heartbeat, and can notify the application whenever the connection state changes. Again, you need to be writing both client and server applications, because ZeroMQ has it's own protocol on top of tcp that is not compatible with just a plain old socket. If you're interested in this approach, the words to look for in the API documentation is socket monitor and ZMQ_HEARTBEAT_IVL;
If, really, only one end needs to know the connection is still available, that can be accomplished by having the other end just sending out "pings". That might fit a situation where you're not writing the software at both ends. For example, a server application might be configured (rather than re-written) to stream out data regardless of whether the client wants it or not, and the client ignores most of it. However, the client knows that if it is receiving data it then also knows there is a connection. The server does not know (it's just blindly sending out data, up until its writes() eventually block), but may not need to know.
Ping ponging is also good in that it gives some indication of the performance of the network. If one end is expecting a pong within 5 seconds of sending a ping but doesn't get it, that indicates that all is not as expected (even if packets are eventually turning up).
This allows discrimination between networks that are usefully working, and networks that are delivering packets but too slowly to be useful. The latter is still technically "connected" and is probably represented as connected by other tests (e.g. calling getpeername()), but it may as well not be.
Limited Local Knowledge...
There is limited things one can do locally to a peer. A peer can know whether its connection to the network exists (e.g. the NIC reports a live connection), but that's about it.
My Opinion
Personally speaking, I default to ZeroMQ these days if at all possible. Even if it means a software re-write, that's not so bad as it seems. This is because one is generally replacing code such as connect() with zmq_connect(), and recv() with zmq_revc(), etc. There's often a lot of code removal too. ZeroMQ is message orientated, a tcp socket is stream orientated. Quite a lot of applications have to adapt tcp into a message orientation, and ZeroMQ replaces all the code that does that.
ZeroMQ is also well supported across numerous languages, either in bindings and / or re-implementations.
man connect
If the initiating socket is connection-mode, .... If the connection cannot be established immediately and O_NONBLOCK is not set for the file descriptor for the socket, connect() shall block for up to an unspecified timeout interval until the connection is established. If the timeout interval expires before the connection is established, connect() shall fail and the connection attempt shall be aborted.
If connect() is interrupted by a signal that is caught while blocked waiting to establish a connection, connect() shall fail and set errno to [EINTR], but the connection request shall not be aborted, and the connection shall be established asynchronously.
If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously.
When the connection has been established asynchronously, select() and poll() shall indicate that the file descriptor for the socket is ready for writing.
If the socket is in blocking mode, connect will block while the connection is in progress. After connect returns, you'll know if a connection has been established (or not).
A signal could interrupt the (blocking/waiting) process, the connection routine will then switch to asynchronous mode.
If the socket is in non blocking mode (O_NONBLOCK) and the connection cannot be established immediately, connect will fail with the error EINPROGRESS and like above switching to asynchronous mode, that means, you'll have to use select or poll to figure out if the socket is ready for writing (indicates established connection).

how does non-blocking tcp socket notify application on packets which fail to get sent.

Im working on a non-blocking C tcp sockets for linux system. I've read that in non-blocking mode, the "send" command will return "bytes sent" immediately if there is no error. I'm guessing this value returned does not actually mean that those data have been delivered to the destination but rather the data has been passed to kernel memory for it to handle further and send.
If that is the case, how would my application know which packet has really been sent out by kernel to the other end, assuming that the network connection had some problems and kernel decides to give up only after several retries in a span of a few minutes later?
Im asking because i would want my application to resend those failed packets again at a later time.
If that is the case, how would my application know which packet has
really been sent out by kernel to the other end, assuming that the
network connection had some problems and kernel decides to give up
only after several retries in a span of a few minutes later?
Your application won't know, unless it is able to recontact the receiving application and ask the receiving application about what data it had previously received.
Keep in mind that even with blocking I/O your application doesn't block until the data is received by the remote application -- it only blocks until there is some room in the kernel's outgoing-data buffer to hold the bytes you asked the TCP stack to send(). So even with blocking I/O you would face the same issue.
Also keep in mind that the byte arrays you pass to send() do not have a guaranteed 1-to-1 correspondence to the TCP packets that the TCP stack sends out. The TCP stack is free to pack your bytes into TCP packets any way it likes (e.g. the data from multiple send() calls can end up in a single TCP packet, or the data from a single send() call can end up in multiple TCP packets, or any other combination you can think of). Depending on network conditions, TCP stacks can and do pack things various different ways, their only promise is that the bytes will be received in FIFO order (if they get received at all).
Anyway, the answer to your question is: you can't know, unless you later ask the receiving program about what it got (or didn't get).
TCP internally takes care of retrying, application doesn't need to do any special handling for it. If you wish to confirm a packet received the other end of the TCP stack then you can set the send socket buffer (setsockopt(SOL_SOCKET, SO_SNDBUF)) to zero. In this case, kernel uses your application buffer to send the data & its only released after the TCP receives acknowledgement for this data. This way you can confirm that the data is pushed to the receiver end of the TCP stack. It doesn't confirm that the application has received the data. You need to have application layer acknowledgement in your protocol to confirm that the data reached the receiver application.

Is there a way to tell the OS to drop any buffered outgoing TCP data?

I've got an amusing/annoying situation in my TCP-based client software, and it goes like this:
my client process is running on a laptop, and it is connected via TCP to my server process (which runs on another machine across the LAN)
irresponsible user pulls the Ethernet cable out of his laptop while the client is transmitting TCP data
client process continues calling send() with some additional TCP data, filling up the OS's SO_SNDBUF buffer, until...
the client process is notified (via MacOS/X's SCDynamicStoreCallback feature) that the ethernet interface is down, and responds by calling close() on its TCP socket
two to five seconds pass...
user plugs the Ethernet cable back in
the client process is notified that the interface is back up, and reconnects automatically to the server
That all works pretty well... except that there is often also an unwanted step 8, which is this:
.8. The TCP socket that was close()'d in step 4 recovers(!) and sends the remainder of the data that was in the kernel's outbound-data buffer for that socket. This happens because the OS tries to deliver all of the outbound TCP data before freeing the socket... usually a good thing, but in this case I'd prefer that that didn't happen.
So, the question is, is there a way to tell the TCP layer to drop the data in its SO_SNDBUF? If so, I could make that call just before close()-ing the dead socket in step 4, and I wouldn't have to worry about zombie data from the old socket arriving at the server after the old socket was abandoned.
This (data recieved from two different TCP connections is not ordered with respect to each other) is a fundamental property of TCP/IP. You shouldn't try and work around it by clearing the send buffer - this is fragile. Instead, you should fix the application to handle this eventuality at the application layer.
For example, if you recieve a new connection on the server side from a client that you believe is already connected, you should probably drop the existing connection.
Additionally, step 4 of the process is a bit dubious. Really, you should just wait until TCP reports an error (or an application-level timeout occurs on the connection) - as you've noticed, TCP will recover if the physical disconnection is only a brief one.
If you want to discard any data that is awaiting transmission when you close the socket at step 4 then simply set SO_LINGER to 0 before closing.

Problem between IO heavy operations and network application listening for UDP and SCTP data

We have an application that uses two types of socket, a listening UDP socket and an active SCTP socket.
At certain time we have scripts running on the same machine that have high IO activities (such as "dd, tar, ..."), most of the time when these IO heavy applications run we seem to have the following problems:
The UDP socket closes
The SCTP socket is still alive and we can see it in /proc/net/sctp/assocs however no traffic is received anymore from this socket (until we restart the application)
Why are these I/O operations affecting the network based application in such a way?
Is there any kernel configurations to avoid these problems?
I would have expected some packets to be lost on the UDP and some retries on the SCTP socket but not this behavior.
The application is running on a server with 64-bits 4 quad core CPU and RHEL OS
# uname -a
Linux server1 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
When you say the UDP socket closes, what exactly do you mean? You try send and it fails?
For SCTP, can you collect wireshark or pcap traces at the time these I/O operations runs (preferably run wireshark on the peer)? My guess is (an educated guess without looking at the code), when these I/O operations comes into the picture, your process gets starved for CPU time. The other end sends SCTP Heartbeat messages to which it gets no replies. Or if data was flowing, the peer end is not receiving any SACKS as they have not yet been processed by the SCTP stack at your end.
The peer, therefore, aborts the association internally and stops sending you data (since it sees all the paths as down ergo does not send ABORT. In such a case, your SCTP stack will still think Association is alive).
Try to confirm what are the values for Heartbeat timeout, RTO timeout,SACK timeout, maximum Path retransmission & max Association retransmission at the peer end. I haven't worked with Kernel SCTP but sysctl should be able to give you those values.
Either ways, collecting pcap traces when you observe this problem would give us much better insight to what is going wrong. I hope it helps.
Here are some things I'd look into:
What is loading on the UDP socket when the scripts are not running? Is it continuous or bursty? Does the socket ever spontaneously close when the scripts are not running? What is happening to the data being read off the socket? How much data generated off of the socket (raw or processed) is being written to disk? Can you monitor CPU, network, and disk IO utilization to see if any of them are saturating? Can the scripts running the IO operations be run at a lower priority or, conversely, can the process running the UDP socket be run at a higher priority?
One thing allot of people don't check for is return values on sends, and they don't check for error conditions like EINTR on recv's. Maybe the heavy IO load is causing some of your send's or recv's to get interrupted and your app is seeing the errors as a hard errors and closing the socket without you realizing that the errors are transient.
I've seen this kind of thing happen and you should definitely check for it by cranking up your log level and seeing if your app is calling close unexpectedly.

Using SO_REUSEADDR - What happens to previously open socket?

In network programming in unix, I have always set the SO_REUSEADDR option on the socket being used by server to listen to connections on. This basically says that another socket can be opened on the same port on the machine. This is useful when recovering from a crash and the socket was not properly closed - the app can be restarted and it will simply open another socket on the same port and continue listening.
My question is, what happens to the old socket? Without a doubt, all data/connections will still be received on the old socket. Does it get closed automatically by the OS?
A socket is considered closed when the program that was using it dies. That much is handled by the OS, and the OS will refuse to accept any further communication from the dead conversation. However, if the socket was closed unexpectedly, the computer on the other end might not know that the conversation is over, and may still be attempting to communicate.
That is why there is, designed into the TCP spec, a waiting period before that same port number can be reused. Because in theory, however unlikely, it may be possible for a packet from the old conversation to arrive with the appropriate IP address, port numbers, and sequence numbers such that the receiving server mistakenly inserts it into the wrong TCP stream by accident.
The SO_REUSEADDR option overrides that behavior, allowing you to reuse the port immediately. Effectively, you're saying: "I understand the risks and would like to use the port anyway."
Yes, the OS automatically closes the previous socket when the old process ends. The reason you can't normally listen on the same port right away is because the socket, though closed, remains in the 2MSL state for some amount of time (generally a few minutes). The OS automatically transitions the old socket out of this state when the timeout expires.

Resources