I know that broken pipe error is thrown when the socket on the peer side is closed.
But, in my test I have noted that an immediate 'send' call on this side when the peer side is closed doesn't always lead to a broken pipe error.
E.g.:
After closing the socket on peer side (I have tried clean closing by calling close and also abnormal closing by killing the peer), if I try to send 40 bytes, then I don't get a broken pipe, but, if I try to send 40000 bytes then it immediately gives broken pipe error.
What exactly causes broken pipe and can it's behavior be predicted?
It can take time for the network close to be observed - the total time is nominally about 2 minutes (yes, minutes!) after a close before the packets destined for the port are all assumed to be dead. The error condition is detected at some point. With a small write, you are inside the MTU of the system, so the message is queued for sending. With a big write, you are bigger than the MTU and the system spots the problem quicker. If you ignore the SIGPIPE signal, then the functions will return EPIPE error on a broken pipe - at some point when the broken-ness of the connection is detected.
The current state of a socket is determined by 'keep-alive' activity. In your case, this is possible that when you are issuing the send call, the keep-alive activity tells that the socket is active and so the send call will write the required data (40 bytes) in to the buffer and returns without giving any error.
When you are sending a bigger chunk, the send call goes in to blocking state.
The send man page also confirms this:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in non-blocking I/O mode. In non-blocking mode it would return EAGAIN in this case
So, while blocking for the free available buffer, if the caller is notified (by keep-alive mechanism) that the other end is no more present, the send call will fail.
Predicting the exact scenario is difficult with the mentioned info, but I believe, this should be the reason for you problem.
Maybe the 40 bytes fits into the pipe buffer, and the 40000 bytes doesn't?
Edit:
The sending process is sent a SIGPIPE signal when you try to write to a closed pipe. I don't know exactly when the signal is sent, or what effect the pipe buffer has on this. You may be able to recover by trapping the signal with the sigaction call.
When peer close, you just do not know whether it just stop sending or both sending and receiving.Because TCP allows this, btw, you should know the difference between close and shutdown.
If peer both stop sending and receiving, first you send some bytes, it will succeed. But the peer kernel will send you RST. So subsequently you send some bytes, your kernel will send you SIGPIPE signal, if you catch or ignore this signal, when your send returns, you just get Broken pipe error, or if you don't , the default behavior of your program is crashing.
Session timeout settings may be the reason of broken pipe.
For example: Server session timeout 3 hours and load balancer has 1 hour.
Load balancer blocks after 1 hour, but server keeps sending response. In this case, one end of the pipe is broken.
But it can be also a user behavior. User closes the page during download etc.
Related
According to Unix Network Programming when a socket writes twice to a closed socket (after a FIN packet), then in the first time it succeeded to send, but receives an RST packet from the other host. Since the host receives an RST, the socket is destroyed. Thus in the second time it writes, the SIGPIPE signal is received, and an EPIPE error is returned.
However, in send man pages ECONNRESET can be returned, which means that an RST packet is received. When it returns ECONNRESET -there no signal is returned.
What are the cases ECONNRESET can be returned? and why does there is no SIGPIPE signal in this case?
Note: I have checked I similar question here. However, when I run in my linux computer, send returned the EPIPE error, and not ECONNRESET.
If the peer closed the connection while there were still unhandled data in the socket buffer it will send a RST packet back. This will cause a flag to be set on the socket and the next send will return ECONNRESET as the result . EPIPE instead is returned (or SIGPIPE triggered) on send if the connection was closed by the peer with no outstanding data. In both cases the local socket is still open (i.e. the file descriptor is valid), but the underlying connection is closed.
Example: Imagine a server which reads a single byte and then closes the connection:
EPIPE: The client sends first one byte. After the server read the byte and closed the connection the client will send some more data and then again some data. The latest send call will trigger EPIPE/SIGPIPE.
ECONNRESET: The client sends first more than one byte. The server will read a single byte and close the connection with more bytes in the sockets receive buffer. This will trigger a connection RST packet from the server and on the next send the client will receive ECONNRESET.
A TCP connection can be seen as two data pipelines between two endpoints. One data pipeline for sending data from A to B and one data pipeline for sending data from B to A. These two pipelines belong to a single connection but they don't otherwise influence each other. Sending data on one pipeline has no effect on data being sent on the other pipeline. If data on one pipeline is reply data to data sent previously on the other pipeline, this is something only your application will know, TCP knows nothing about that. The task of TCP is to make sure that data reliably makes it from one end of the pipeline to the other end and that as fast as possible, that is all that TCP cares for.
As soon as one side is done sending data, it tells the other side it is done by tranmitting it a packet with the FIN flag set. Sending a FIN flag means "I have sent all the data I wanted to send to you, so my send pipeline is now closed". You can trigger that intentionally in your code by calling shutdown(socketfd, SHUT_WR). If the other side will then call recv() on the socket, it won't get an error but receive will say that it read zero bytes, which means "end of stream". End of stream is not an error, it only means that no more data will ever arrive there, no matter how often you are going to call recv() on that socket.
Of course, this doesn't affect the other pipeline, so when A -> B is closed, B -> A can still be used. You can still receive from that socket, even though you closed your sending pipeline. At some point, though, also B will be done with sending data and also transmit a FIN. Once both pipelines are closed, the connection as a whole is closed and this would be a graceful shutdown, as both sides have been able to send all the data they wanted to send and no data should have been lost, since as long as there was unconfirmed data in flight, the other side would not have said it is done but wait for that data to be reliably transferred first.
Alternatively there is the RST flag which closes the entire connection at once, regardless if the other side was done sending and regardless if there was unconfirmed data in flight, so a RST has a high potential of causing data to be lost. As that is an exceptional situation that may require special handling, it would be useful for programmers to know if that was the case, that's why there exists two errors:
EPIPE - You cannot send over that pipe as that pipe is not valid anymore. However, all data that you were sending before it broke was still reliably delivered, you just cannot send any new data.
ECONNRESET - Your pipe is broken and it may be the case that data you were trying to send before got lost in the middle of transfer. If that is a problem, you better handle it somehow.
But these two errors do not map one to one to the FIN and RST flag. If you receive a RST in a situation where the system sees no risk of data loss, there is no reason to drive you round the bend for nothing. So if all data you sent before was ACKed to be correctly received and then the connection was closed by a RST when you tried to send new data, no data was lost. This includes the current data you tried to send as this data wasn't lost, it was never sent on the way, that's a difference as you still have it around whereas data you were sending before may not be around anymore. If your car breaks down in the middle of a road trip then this is quite a different situation than if you you are still at home as your car engine refused to even start. So in the end it's your system that decides if a RST triggers a ECONNRESET or a EPIPE.
Okay, but why would the other side send you a RST in the first place? Why not always closing with FIN? Well, there exists a couple of reasons but the two most prominent ones are:
A side can only signal the other one that it is done sending but the only way to signal that it is done with the entire connection is to send a RST. So if one side wants to close a connection and it wants to close it gracefully, it will first send a FIN to signal that it won't send new data anymore and then give the other side some time to stop sending data, allowing in-flight data to pass through and to finally send a FIN as well. However, what if the other side doesn't want to stop and keeps sending and sending? This behavior is legal as a FIN doesn't mean that the connection needs to close, it only means one side is done. The result is that the FIN is followed by RST to finally close that connection. This may have caused in-flight data to be lost or it may not, only the recipient of the RST will know for sure as if data was lost, it must have been on his side since the sender of the RST was surely not sending any more data after the FIN. For a recv() call, this RST has no effect as there was a FIN before signaling "end of stream", so recv() will report having read zero bytes.
One side shall close the connection, yet it sill has unsent data. Ideally it would wait till all unsent data has been sent and then transmit a FIN, however, the time it is allowed to wait is limited and after that time has passed, there is still unsent data left. In that case it cannot send a FIN as that FIN would be a lie. It would tell the other side "Hey, I sent all the data I wanted to send" but that's not true. There was data that should have been sent but as the close was required to be instant, this data had to be discarded and as a result, this side will directly send a RST. Whether this RST triggers a ECONNRESET for the send() call depends again on the fact, if the recipient of the RST had unsent data in flight or not. However, it will for sure trigger a ECONNRESET error on the next recv() call to tell the program "The other side actually wanted to send more data to you but it couldn't and thus some of that data was lost", since this may again be a situation that handling somehow, as the data you've received was for sure incomplete and this is something you should be made aware of.
If you want to force a socket to be always closed directly with RST and never with FIN/FIN or FIN/RST, you can just set the Linger time to zero.
struct linger l = { .l_onoff = 1, .l_linger = 0 };
setsockopt(socketfd, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
Now the socket must close instantly and without any delay, no matter how little and the only way to close a TCP socket instantly is to send a RST. Some people think "Why enabling it and setting time to zero? Why not just disabling it instead?" but disabling has a different meaning.
The linger time is the time a close() call may block to perform pending send actions to close a socket gracefully. If enabled (.l_onoff != 0), a call to close() may block for up to .l_linger seconds. If you set time to zero, it may not block at all and thus terminates instantly (RST). However, if you disable it, then close() will never block either but then the system may still linger on close, yet this lingering happens in the background, so your process won't notice it any longer and thus also cannot know when the socket has really closed, as the socketfd becomes invalid at once, even if the underlying socket in kernel still exists.
I'm a newbie to socket programming, I know it's a bad habit to close socket using "control-c", but why socket on the receiving peer keeps receiving '' infinitely after I use "control-c" to close the sending process? shouldn't the socket on the sending peer be closed after "control-c" to exit the process? Thanks!
I know it's a bad habit to close socket using "control-c"
That closes the entire process, not just a socket.
why socket on the receiving peer keeps receiving '' infinitely after I use "control-c" to close the sending process?
At a guess, which is all that is possible without seeing the code you should have posted in your question, you are ignoring errors and end-of-stream when calling recv().
shouldn't the socket on the sending peer be closed after "control-c" to exit the process?
It is. The whole process is 'closed', including all its resources.
As regards the receiving socket, it is up to you to detect the conditions under which it should be close, and close it.
No code given, but here's an educated guess of what might be going on:
You have two separate bits of code running: Sending and receiving
You are in the process of transferring data when you use CTL+C to kill the sending socket.
You expect the receiving socket to stop, but it doesn't.
The issue could be one of "end of transmission" agreement. If the sending code fires off an End-of-File (EOF) (or abort or terminated) when you hit CTL+C, then the receiving socket should see that and quit receiving. However, you haven't stated what the sending code is doing the moment you hit CTL+C.
The receiving socket may just be waiting for more data; as far as the receiving code is concerned, it was to be told when transfer is done, and it is patiently waiting for more information.
There are far better socket programmers around than I am, but I think it's safe to say that, once you get down to that level, you should pay attention to the details of the transfer protocol. If CTL+C just terminates the server (sending) code, then the client has no idea if there is a real termination, an unexpected delay in the transmission, or the server process just had a brain-fart and will start sending again once things clear up.
If you have any means of monitoring the actual values going back an forth, take a look at what happens during a "normal" termination of data transfer and a CTL+C termination. This might help you zero in on the undesirable behavior.
Two cases are well-documented in the man pages for non-blocking sockets:
If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a state of returning EAGAIN/EWOULDBLOCK the next call with >0 bytes to transfer.
If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is ready for more data (EPOLLOUT in the epoll case).
What's not documented for nonblocking sockets is:
If send() returns a positive value smaller than the buffer size.
Is it safe to assume that the send() would return EAGAIN/EWOULDBLOCK on even one more byte of data? Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK? I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually in a "would block" state to respond to it coming out of.
Obviously, the latter strategy (trying again to get something conclusive) has well-defined behavior, but it's more verbose and puts a hit on performance.
A call to send has three possible outcomes:
There is at least one byte available in the send buffer →send succeeds and returns the number of bytes accepted (possibly fewer than you asked for).
The send buffer is completely full at the time you call send.
→if the socket is blocking, send blocks
→if the socket is non-blocking, send fails with EWOULDBLOCK/EAGAIN
An error occurred (e.g. user pulled network cable, connection reset by peer) →send fails with another error
If the number of bytes accepted by send is smaller than the amount you asked for, then this consequently means that the send buffer is now completely full. However, this is purely circumstantial and non-authorative in respect of any future calls to send.
The information returned by send is merely a "snapshot" of the current state at the time you called send. By the time send has returned or by the time you call send again, this information may already be outdated. The network card might put a datagram on the wire while your program is inside send, or a nanosecond later, or at any other time -- there is no way of knowing. You'll know when the next call succeeds (or when it doesn't).
In other words, this does not imply that the next call to send will return EWOULDBLOCK/EAGAIN (or would block if the socket wasn't non-blocking). Trying until what you called "getting a conclusive EWOULDBLOCK" is the correct thing to do.
If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a blocking state.
No. The socket remains in the mode it was in: in this case, non-blocking mode, assumed below throughout.
If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is isn't blocking anymore.
Until the send buffer isn't full any more. The socket remains in non-blocking mode.
If send() returns a positive value smaller than the buffer size.
There was only that much room in the socket send buffer.
Is it safe to assume that the send() would block on even one more byte of data?
It isn't 'safe' to 'assume [it] would block' at all. It won't. It's in non-blocking mode. EWOULDBLOCK means it would have blocked in blocking mode.
Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK?
That's up to you. The API works whichever you decide.
I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually blocking on that.
It isn't 'blocking on that'. It isn't blocking on anything. It's in non-blocking mode. The send buffer got filled at that instant. It might be completely empty a moment later.
I don't see what you're worried about. If you have pending data and the last write didn't send it all, select for writability, and write when you get it. If such a write sends everything, don't select for writability next time.
Sockets are usually writable, unless their send buffer is full, so don't select for writability all the time, as you just get a spin loop.
In C, I understood that if we close a socket, it means the socket will be destroyed and can be re-used later.
How about shutdown? The description said it closes half of a duplex connection to that socket. But will that socket be destroyed like close system call?
This is explained in Beej's networking guide. shutdown is a flexible way to block communication in one or both directions. When the second parameter is SHUT_RDWR, it will block both sending and receiving (like close). However, close is the way to actually destroy a socket.
With shutdown, you will still be able to receive pending data the peer already sent (thanks to Joey Adams for noting this).
None of the existing answers tell people how shutdown and close works at the TCP protocol level, so it is worth to add this.
A standard TCP connection gets terminated by 4-way finalization:
Once a participant has no more data to send, it sends a FIN packet to the other
The other party returns an ACK for the FIN.
When the other party also finished data transfer, it sends another FIN packet
The initial participant returns an ACK and finalizes transfer.
However, there is another "emergent" way to close a TCP connection:
A participant sends an RST packet and abandons the connection
The other side receives an RST and then abandon the connection as well
In my test with Wireshark, with default socket options, shutdown sends a FIN packet to the other end but it is all it does. Until the other party send you the FIN packet you are still able to receive data. Once this happened, your Receive will get an 0 size result. So if you are the first one to shut down "send", you should close the socket once you finished receiving data.
On the other hand, if you call close whilst the connection is still active (the other side is still active and you may have unsent data in the system buffer as well), an RST packet will be sent to the other side. This is good for errors. For example, if you think the other party provided wrong data or it refused to provide data (DOS attack?), you can close the socket straight away.
My opinion of rules would be:
Consider shutdown before close when possible
If you finished receiving (0 size data received) before you decided to shutdown, close the connection after the last send (if any) finished.
If you want to close the connection normally, shutdown the connection (with SHUT_WR, and if you don't care about receiving data after this point, with SHUT_RD as well), and wait until you receive a 0 size data, and then close the socket.
In any case, if any other error occurred (timeout for example), simply close the socket.
Ideal implementations for SHUT_RD and SHUT_WR
The following haven't been tested, trust at your own risk. However, I believe this is a reasonable and practical way of doing things.
If the TCP stack receives a shutdown with SHUT_RD only, it shall mark this connection as no more data expected. Any pending and subsequent read requests (regardless whichever thread they are in) will then returned with zero sized result. However, the connection is still active and usable -- you can still receive OOB data, for example. Also, the OS will drop any data it receives for this connection. But that is all, no packages will be sent to the other side.
If the TCP stack receives a shutdown with SHUT_WR only, it shall mark this connection as no more data can be sent. All pending write requests will be finished, but subsequent write requests will fail. Furthermore, a FIN packet will be sent to another side to inform them we don't have more data to send.
There are some limitations with close() that can be avoided if one uses shutdown() instead.
close() will terminate both directions on a TCP connection. Sometimes you want to tell the other endpoint that you are finished with sending data, but still want to receive data.
close() decrements the descriptors reference count (maintained in file table entry and counts number of descriptors currently open that are referring to a file/socket) and does not close the socket/file if the descriptor is not 0. This means that if you are forking, the cleanup happens only after reference count drops to 0. With shutdown() one can initiate normal TCP close sequence ignoring the reference count.
Parameters are as follows:
int shutdown(int s, int how); // s is socket descriptor
int how can be:
SHUT_RD or 0
Further receives are disallowed
SHUT_WR or 1
Further sends are disallowed
SHUT_RDWR or 2
Further sends and receives are disallowed
This may be platform specific, I somehow doubt it, but anyway, the best explanation I've seen is here on this msdn page where they explain about shutdown, linger options, socket closure and general connection termination sequences.
In summary, use shutdown to send a shutdown sequence at the TCP level and use close to free up the resources used by the socket data structures in your process. If you haven't issued an explicit shutdown sequence by the time you call close then one is initiated for you.
I've also had success under linux using shutdown() from one pthread to force another pthread currently blocked in connect() to abort early.
Under other OSes (OSX at least), I found calling close() was enough to get connect() fail.
"shutdown() doesn't actually close the file descriptor—it just changes its usability. To free a socket descriptor, you need to use close()."1
Close
When you have finished using a socket, you can simply close its file descriptor with close; If there is still data waiting to be transmitted over the connection, normally close tries to complete this transmission. You can control this behavior using the SO_LINGER socket option to specify a timeout period; see Socket Options.
ShutDown
You can also shut down only reception or transmission on a connection by calling shutdown.
The shutdown function shuts down the connection of socket. Its argument how specifies what action to perform:
0
Stop receiving data for this socket. If further data arrives, reject it.
1
Stop trying to transmit data from this socket. Discard any data waiting to be sent. Stop looking for acknowledgement of data already sent; don’t retransmit it if it is lost.
2
Stop both reception and transmission.
The return value is 0 on success and -1 on failure.
in my test.
close will send fin packet and destroy fd immediately when socket is not shared with other processes
shutdown SHUT_RD, process can still recv data from the socket, but recv will return 0 if TCP buffer is empty.After peer send more data, recv will return data again.
shutdown SHUT_WR will send fin packet to indicate the Further sends are disallowed. the peer can recv data but it will recv 0 if its TCP buffer is empty
shutdown SHUT_RDWR (equal to use both SHUT_RD and SHUT_WR) will send rst packet if peer send more data.
linux: shutdown() causes listener thread select() to awake and produce error. shutdown(); close(); will lead to endless wait.
winsock: vice versa - shutdown() has no effect, while close() is successfully catched.
I am using write() on a opened data socket in FTP implementation to send the file out. But after writing some data it is hanging for some time; and after that it is returning with Broken pipe error. any help in this will greatly appreciated. My process reads packets from one buff and writes in to the socket. I noticed this problem with increased bandwidth. If i increased number of packets to be processed then the problem is coming. i am using FreeBSD.
I am using two threads one reads packets and writes in to a buffer ... second thread reads these packets from buffer and writes in to socket.
Thanks For your help
Alexander
SIGPIPE is sent to your process by the kernel when attempt to write data to a broken pipe is detected. This might happen, for example, if receiving side has closed the socket while you writing, or if socket is accidentally closed from another thread, etc. There are a lot of possible reasons for that. Most applications tend to ignore this signal and handle errors basing on "write" return code because there is nothing reasonable you can do in SIGPIPE signal processing handler. Basically, set SIGPIPE handler to SIG_IGN in order to ignore it and look at a list of possible return codes from "write" system call and handle them accordingly.
EPIPE may be set as an error code, and/or SIGPIPE raised (depending on flags), when you attempt to write to a file descriptor that has closed. It is likely that the remote endpoint of your connection has closed, and you've not checked for the close/EOF event (typically returned via the read event when poll/selecting, or a return value of zero from read/recv).