Writing on a TCP socket closed by the peer

Writing on a TCP socket closed by the peer - c

I have a client-server application where each side communicate with the other via TCP socket.
I properly establish the connection and then I crash the server BEFORE any data is written on the socket by the client.
What I see is that the first write() attempt (client-side) is successful and it returns the actual number of written bytes, while the following ones return (as I expected) -1 (receiving a SIGPIPE) and errno=EPIPE.
Why the first write() is successful even if the socket is already closed?
EDIT
Sometimes also the following write() have a positive return values, as if everything goes well.

You're confused by what the return value of write() means. It doesn't mean, "the peer got the data and acknowledged it". Instead, it means, "I buffered so-many bytes to send to the peer and they're my responsibility now, so you can forget about them (and I don't have any pending errors)".
That is, if the TCP stack accepts the write and returns n bytes, that doesn't mean they've been written yet, just queued for writing. It'll take some time, perhaps 30s after it starts sending network traffic, before the stack gives up and returns an error to you. During that time, you could have done several calls to write() which were successful at queueing data for sending. (The write error will be returned in c.30s if the peer has vanished, or immediately if the peer can be contacted and sends a RST packet straight away to indicate the connection is dead.)

This has to do with how TCP/IP works, that can be roughly described as two mostly independent half-connections. When you close the socket at the server, the client is told that it will not receive further data from the C<-S half-connection, waking up read() immediatly, but not about the C->S direction. It only gets a reply resetting the connection after it tries to send some data. I recommend the TCP/IP Guide for further details.
The reason why sometimes you can write() twice is that you write faster than the round-trip time and can squeeze a second write() before the reply to the first one.

I'm using the following method to detect a disconnected server condition:
After getting the select() timeout on a socket (nothing was received, though was supposed to),
the 'system("ping -c 1 -w 1 server");' command is activated.
If the server is up and just lagging, the ping command will return in less than 0.1 seconds.
Otherwise (the server is down), the ping command will return in 1 second.

Related

TCP Sockets in C with bad network

I am doing some test with TCP client application in a Raspberry Pi (server in the PC), with PPP (Point to Point Protocol) using a LTE Modem. I have used C program with sockets, checking system call's response. I wanted to test how socket works in a bad coverage area so I did some test removing the antenna.
I have followed the next steps:
Connect to server --> OK
Start sending data (write system call) --> OK (I also check in the server)
I removed the LTE modem's antenna (There is no network, it can't do ping)
Continue sending data (write system call) --> OK (server does not receive anything!!!)
It finished sending data and closed socket --> OK (connection is still opened and there is no data since the antenna was removed)
Program was finished
I put the antenna again
Some time later, the data has been uploaded and the connection closed. But I did another test following this steps but with more data, and it did not upload this data...
I do not know if there any way to ensure that the data written to TCP server is received by the server (I thought that TCP layer ensured this..). I could do it manually using an ACK but I guess that it has to be a better way to do.
Sending part code:
while(i<100)
{
sprintf(buf, "Message %d\n", i);
Return = write(Sock_Fd, buf, strlen(buf));
if(Return!=strlen(buf))
{
printf("Error sending data to TCP server. \n");
printf("Error str: %s \n", strerror(errno));
}
else
{
printf("write successful %d\n", i);
i++;
}
sleep(2);
}
Many thanks for your help.

The write()-syscall returns true, since the kernel buffers the data and puts it in the out-queue of the socket. It is removed from this queue when the data was sent and acked from the peer. When the OutQueue is full, the write-syscall will block.
To determine, if data has not been acked by the peer, you have to look at the size of the outqueue. With linux, you can use an ioctl() for this:
ioctl(fd, SIOCOUTQ, &outqlen);
However, it would be more clean and portable to use an inband method for determining if the data has been received.

TCP/IP is rather primitive technology. Internet may sound newish, but this is really antique stuff. TCP is needed because IP gives almost no guarantees, but TCP doesn't actually add that many guarantees. Its chief function is to turn a packet protocol into a stream protocol. That means TCP guarantees a byte order; no bytes will arrive out of order. Don't count on more than that.
You see that protocols on top of TCP add extra checks. E.g. HTTP has the famous HTTP error codes, precisely because it can't rely on the error state from TCP. You probably have to do the same - or you can consider implementing your service as a HTTP service. "RESTful" refers to an API design methodology which closely follows the HTTP philosophy; this might be relevant to you.

The short answer to your 4th and 5th topics was taken as a shortcut from this answer (read the whole answer to get more info)
A socket has a send buffer and if a call to the send() function succeeds, it does not mean that the requested data has actually really been sent out, it only means the data has been added to the send buffer. For UDP sockets, the data is usually sent pretty soon, if not immediately, but for TCP sockets, there can be a relatively long delay between adding data to the send buffer and having the TCP implementation really send that data. As a result, when you close a TCP socket, there may still be pending data in the send buffer, which has not been sent yet but your code considers it as sent, since the send() call succeeded. If the TCP implementation was closing the socket immediately on your request, all of this data would be lost and your code wouldn't even know about that. TCP is said to be a reliable protocol and losing data just like that is not very reliable. That's why a socket that still has data to send will go into a state called TIME_WAIT when you close it. In that state it will wait until all pending data has been successfully sent or until a timeout is hit, in which case the socket is closed forcefully.
The amount of time the kernel will wait before it closes the socket,
regardless if it still has pending send data or not, is called the
Linger Time.
BTW: that answer also refers to the docs where you can see more detailed info

Determine if peer has closed reading end of socket

I have a socket programming situation where the client shuts down the writing end of the socket to let the server know input is finished (via receiving EOF), but keeps the reading end open to read back a result (one line of text). It would be useful for the server to know that the client has successfully read the result and closed the socket (or at least shut down the reading end). Is there a good way to check/wait for such status?

No. All you can know is whether your sends succeeded, and some of them will succeed even after the peer read shutdown, because of TCP buffering.
This is poor design. If the server needs to know that the client received the data, the client needs to acknowledge it, which means it can't shutdown its write end. The client should:
send an in-band termination message, as data.
read and acknowledge all further responses until end of stream occurs.
close the socket.
The server should detect the in-band termination message and:
stop reading requests from the socket
send all outstanding responses and read the acknowledgements
close the socket.
OR, if the objective is only to ensure that client and server end at the same time, each end should shutdown its socket for output and then read input until end of stream occurs, then close the socket. That way the final closes will occur more or less simultaneously on both ends.

getsockopt with TCP_INFO seems the most obvious choice, but it's not cross-platform.
Here's an example for Linux:
import socket
import time
import struct
import pprint
def tcp_info(s):
rv = dict(zip("""
state ca_state retransmits probes backoff options snd_rcv_wscale
rto ato snd_mss rcv_mss unacked sacked lost retrans fackets
last_data_sent last_ack_sent last_data_recv last_ack_recv
pmtu rcv_ssthresh rtt rttvar snd_ssthresh snd_cwnd advmss reordering
rcv_rtt rcv_space
total_retrans
pacing_rate max_pacing_rate bytes_acked bytes_received segs_out segs_in
notsent_bytes min_rtt data_segs_in data_segs_out""".split(),
struct.unpack("BBBBBBBIIIIIIIIIIIIIIIIIIIIIIIILLLLIIIIII",
s.getsockopt(socket.IPPROTO_TCP, socket.TCP_INFO, 160))))
wscale = rv.pop("snd_rcv_wscale")
# bit field layout is up to compiler
# FIXME test the order of nibbles
rv["snd_wscale"] = wscale >> 4
rv["rcv_wscale"] = wscale & 0xf
return rv
for i in range(100):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 7878))
s.recv(10)
pprint.pprint(tcp_info(s))
I doubt a true cross-platform alternative exists.
Fundamentally there are quite a few states:
you wrote data to socket, but it was not sent yet
data was sent, but not received
data was sent and losts (relies on timer)
data was received, but not acknowledged yet
acknowledgement not received yet
acknowledgement lost (relies on timer)
data was received by remote host but not read out by application
data was read out by application, but socket still alive
data was read out, and app crashed
data was read out, and app closed the socket
data was read out, and app called shutdown(WR) (almost same as closed)
FIN was not sent by remote yet
FIN was sent by remote but not received yet
FIN was sent and got lost
FIN received by your end
Obviously your OS can distinguish quite a few of these states, but not all of them. I can't think of an API that would be this verbose...
Some systems allow you to query remaining send buffer space. Perhaps if you did, and socket was already shut down, you'd get a neat error?
Good news is just because socket is shut down, doesn't mean you can't interrogate it. I can get all of TCP_INFO after shutdown, with state=7 (closed). In some cases report state=8 (close wait).
http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1961 has all the gory details of Linux TCP state machine.

TL;DR:
Don't rely on the socket state for this; it can cut you in many error cases. You need to bake the acknowledgement/receipt facility into your communications protocol. First character on each line used for status/ack works really well for text-based protocols.
On many, but not all, Unix-like/POSIXy systems, one can use the TIOCOUTQ (also SIOCOUTQ) ioctl to determine how much data is left in the outgoing buffer.
For TCP sockets, even if the other end has shut down its write side (and therefore will send no more data to this end), all transmissions are acknowledged. The data in the outgoing buffer is only removed when the acknowledgement from the recipient kernel is received. Thus, when there is no more data in the outgoing buffer, we know that the kernel at the other end has received the data.
Unfortunately, this does not mean that the application has received and processed the data. This same limitation applies to all methods that rely on socket state; this is also the reason why fundamentally, the acknowledgement of receipt/acceptance of the final status line must come from the other application, and cannot be automatically detected.
This, in turn, means that neither end can shut down their sending sides before the very final receipt/acknowledge message. You cannot rely on TCP -- or any other protocols' -- automatic socket state management. You must bake in the critical receipts/acknowledgements into the stream protocol itself.
In OP's case, the stream protocol seems to be simple line-based text. This is quite useful and easy to parse. One robust way to "extend" such a protocol is to reserve the first character of each line for the status code (or alternatively, reserve certain one-character lines as acknowledgements).
For large in-flight binary protocols (i.e., protocols where the sender and receiver are not really in sync), it is useful to label each data frame with an increasing (cyclic) integer, and have the other end respond, occasionally, with an update to let the sender know which frames have been completely processed, and which ones received, and whether additional frames should arrive soon/not-very-soon. This is very useful for network-based appliances that consume a lot of data, with the data provider wishing to be kept updated on the progress and desired data rate (think 3D printers, CNC machines, and so on, where the contents of the data changes the maximum acceptable data rate dynamically).

Okay so I recall pulling my hair out trying to solve this very problem back in the late 90's. I finally found an obscure doc that stated that a read call to a disconnected socket will return a 0. I use this fact to this day.

You're probably better off using ZeroMQ. That will send a whole message, or no message at all. If you set it's send buffer length to 1 (the shortest it will go) you can test to see if the send buffer is full. If not, the message was successfully transferred, probably. ZeroMQ is also really nice if you have an unreliable or intermittent network connection as part of your system.
That's still not entirely satisfactory. You're probably even better off implementing your own send acknowledge mechanism on top of ZeroMQ. That way you have absolute proof that a message was received. You don't have proof that a message was not received (something can go wrong between emitting and receiving the ack, and you cannot solve the Two Generals Problem). But that's the best that can be achieved. What you'll have done then is implement a Communicating Sequential Processes architecture on top of ZeroMQ's Actor Model which is itself implemented on top of TCP streams.. Ultimately it's a bit slower, but your application has more certainty of knowing what's gone on.

How can I detect TCP dead-connection in linux on C?

I wrote a program on C , where a client sent one time some information to a server. I used TCP sockets.And some time the server calculated and should to sent result to the client. How can I detect if connection on The server or the client was broken?

You may want to try TCP keepalives.
# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9`
In the above example, TCP keep-alive timer kicks in after the idle time of 7200 seconds. If the keep-alive messages are unsuccessful then they are retried at the interval of 75 seconds. After 9 successive retry failure, the connection will be brought down.
The keepalive time can be modified at boot time by placing startup script at /etc/init.d.

There is a way to detect, on Linux, dead sockets without reading or writing to them:
Get the numeric (uint) file descriptor from the socket handler.
readlink the file /proc/[pid]/fd/[#hander]. If it's a socket it will return a string like socket:[#inode].
Read /proc/net/tcp, search for the line with that inode (11th column).
Read the status (st) column on that line (4th column). If it's 0x07 (Close) or 0x08 (TIME_WAIT), the socket is dead.

The only way I know for a program to know for certain that a TCP connection is dead is to attempt to send something on it. The attempt will either time-out or return with an error condition. Thus, the program doesn't need to do anything special -- just send the stuff it was designed to send. It does need to handle, however, all possible error-conditions. On time-out, it could retry for a limited time or decide that the connection is dead. The latter case is appropriate if sending the same data multiple times would be harmful. After this or an error condition, the program should close the current connection and, if appropriate, re-establish it.

TCP Keep-Alive is a reliable way to determine if the peer is dead.That is if the peer application exited without doing a proper closure of the open TCP connection.
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
Lookout for how to enable tcp keep-alives(SO_KEEPALIVE) per socket using setsockopt call.
Another method is that the client and the server application agree on heartbeats at regular intervals. Non arrival of heartbeats should indicate that the peer is dead.

Using C sockets: Address already in use

So the basic premise of my program is that I'm supposed to create a tcp session, direct traffic through it, and detect any connection losses. If the connection does break, I need to close the sockets and reopen them (using the same ports) in such a way that it will seem like the connection (almost) never died. It should also be noted that the two programs will be treated as proxies (data gets sent to them, if the connection breaks it gets stored until connection is fixed, then data is sent off).
I've done some research and gone ahead and used setsockopt() with the SO_REUSEADDR option to set the socket options so that I can reuse the address.
Here's the basic algorithm I do to detect a connection break using signals:
After initial setup of sockets, begin sending data
After x seconds, set a flag to false, which will prevent all other data from being sent
Send a single piece of data to let the other program know the connection is still open, reset timer to x seconds
If I receive same piece of data from the program, set the flag to true to continue sending
If I don't receive the data after x seconds, close the socket and attempt to reconnect
(step 5 is where I'm getting the error).
Essentially one program is a client(on one VM) and one program is a server(on another VM), each sending and receiving data to/from each other and to/from another program on each VM.
My question is: Given that I'm still getting this error after setting the socket options, why am I not allowed to re-bind the address when a connection has been detected?
The server is the one complaining when a disconnect is detected (I close the socket, open a new one, set the option, and attempt to bind the port with the same information).
One other thing of note is the way I'm receiving the data from the sockets. If I have a socket open, I'm basically reading it by doing the following:
while((x = recv(socket, buff, 1, 0)>=0){
//add to buffer
// send out to other program if connection is alive
}
Since I'm using the timer to close/reopen the socket, and this is in a different thread, will this prevent the socket from closing?

SO_REUSEADDR only allows limited reuse of ports. Specifically, it does not allow reuse of a port that some other socket is currently actively listening for incoming connections on.

There seems to be an epidemic here of people calling bind() and then setsockopt() and wondering why the setsockopt() doesn't fix an error that had already happened on bind().
You have to call setsockopt() first.
But I don't understand your problem. Why do you think you need to use the same ports? Why are you setting a flag preventing you from sending data? You don't need any of this. Just handle the errors on send() when and if they arise, creating a new connection when necessary. Don't try to out-think TCP. Many have tried, few if any have succeeded.

how to simulate abnormal case for socket/tcp programming in linux, such as terminating one side of connection?

i am learning to use SO_SNDTIMEO and SO_RCVTIMEO to check the timeout.
It is easy to use with read socket. But when i want to check write timeout, it always return successful. Here is what i did:(all in blocking mode)
close the client read socket and exit before server start write
terminate the client before server start write
unplug the cable of server after accept but before write
well, it seems all these case write just return sucessfully.
I think the reason should be that port is resource managed by os, and at the client side, after program gone, the tcp connection still shows FIN_WAIT2 state.
so, is there any convenient way to simulate some cases that write can receive errors such as EPIPE, EAGAIN?

How to get the error EAGAIN?
To get the error EAGAIN, you need to be using Non-Blocking Sockets. With Non-Blocking sockets, you need to write huge amounts of data (and stop receiving data on the peer side), so that your internal TCP buffer gets filled and returns this error.
How to get the error EPIPE?
To get the error EPIPE, you need to send large amount of data after closing the socket on the peer side. You can get more info about EPIPE error from this SO Link. I had asked a question about Broken Pipe Error in the link provided and the accepted answer gives a detailed explanation. It is important to note that to get EPIPE error you should have set the flags parameter of send to MSG_NOSIGNAL. Without that, an abnormal send can generate SIGPIPE signal.
Additional Note
Please note that it is difficult to simulate a write failure, as TCP generally stores the data that you are trying to write into it's internal buffer. So, if the internal buffer has sufficient space, then you won't get an error immediately. The best way is to try to write huge amounts of data. You can also try setting a smaller buffer size for send by using setsockopt function with SO_SNDBUF option

You can simulate errors using fault injection. For example, libfiu is a fault injection library that comes with an example project that allows you to simulate errors from POSIX functions. Basically it uses LD_PRELOAD to inject a wrapper around the regular system calls (including write), and then the wrapper can be configured to either pass through to the real system call, or return whatever error you like.

You could set the receive buffer size to be really small on one side, and send a large buffer on the other. Or on the one side set the send buffer small and try to send a large message.
Otherwise the most common test (I think) is to let the server and client talk for a while, and then remove a network cable.