I have a very simple client-server with one blocking socket doing full-duplex communication. I've enabled SSL/TLS to the application. The model is that of a typical producer-consumer. The client produces the data, sends it to the server and the server processes them. The only catch is that, once in a while the server sends data back to the client which the client handles accordingly. Below is a very simple pseudo code of the application:
1 Client:
2 -------
3 while (true)
4 {
5 if (poll(pollin, timeout=0) || 0 < SSL_pending(ssl))
6 {
7 SSL_read();
8 // Handle WANT_READ or WANT_WRITE appropriately.
9 // If no error, handle the received control message.
10 }
11 // produce data.
12 while (!poll(pollout))
13 ; // Wait until the pipe is ready for a send().
14 SSL_write();
15 // Handle WANT_READ or WANT_WRITE appropriately.
16 if (time to renegotiate)
17 SSL_renegotiate(ssl);
18 }
19
20 Server:
21 -------
22 while (true)
23 {
24 if (poll(pollin, timeout=1s) || 0 < SSL_pending(ssl))
25 {
26 SSL_read();
27 // Handle WANT_READ or WANT_WRITE appropriately.
28 // If no error, consume data.
29 }
30 if (control message needs to be sent)
31 {
32 while (!poll(pollout))
33 ; // Wait until the pipe is ready for a send().
34 SSL_write();
35 // Handle WANT_READ or WANT_WRITE appropriately.
36 }
37 }
The trouble happens when, for testing purposes, I force SSL renegotiation (lines 16-17). The session starts nice and easy, but after a while, I get the following errors:
Client:
-------
error:140940F5:SSL routines:SSL3_READ_BYTES:unexpected record
Server:
-------
error:140943F2:SSL routines:SSL3_READ_BYTES:sslv3 alert unexpected message
Turns out, around the same time that the client initiates a renegotiation (line 14), the server ends up sending application data to the client (line 34). The client as part of the renegotiation process receives this application data and bombs with a "unexpected record" error. Similarly, when the server does the subsequent receive (line 26), it ends up receiving a renegotiation data when it was expecting application data.
What am I doing wrong? How should I handle/test SSL renegotiations with a full-duplex channel. Note that, there are no threads involved. It's a simple single threaded model with reads/writes happening on either end of the socket.
UPDATE : To verify that there is nothing wrong with the application that I have written, I could even reproduce this quite comfortably with OpenSSL's s_client and s_server implementations. I started a s_server and once the s_client got connected to the server, I programmatically send a bunch of application data from the server to the client and a bunch of 'R' (renegotiation requests) from the client to the server. Eventually, they both fail in exactly the same manner as described above.
s_client:
RENEGOTIATING
4840:error:140940F5:SSL routines:SSL3_READ_BYTES:unexpected record:s3_pkt.c:1258:
s_server:
Read BLOCK
ERROR
4838:error:140943F2:SSL routines:SSL3_READ_BYTES:sslv3 alert unexpected message:s3_pkt.c:1108:SSL alert number 10
4838:error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure:s3_pkt.c:1185:
UPDATE 2:
Ok. As suggested by David, I reworked the test application to use non-blocking sockets and always do SSL_read and SSL_write first and do the select based on what they return and I still get the same errors during renegotiations (SSL_write ends up getting application data from the other side in the midst of renegotiation). The question is, at any point in time, if SSL_read returns WANT_READ, can I assume it is because there is nothing in the pipe and go ahead with SSL_write since I have something to write? If not, that's probably why I end up with errors. Either that, or I am doing the renegotiation all wrong. Note, if SSL_read returns WANT_WRITE, I always do a select and call SSL_read again.
You're trying to "look through" the SSL black box. This is a huge mistake.
if (poll(pollin, timeout=0) || 0 < SSL_pending(ssl))
{
SSL_read();
You're making the assumption that in order for SSL_read to make forward progress, it needs to read data from the socket. This is an assumption that can be false. For example, if a renegotiation is in progress, the SSL engine may need to send data next, not read data.
while (!poll(pollout))
; // Wait until the pipe is ready for a send().
SSL_write();
How do you know the SSL engine wants to write data to the pipe? Did it give you a WANT_WRITE indication? If not, maybe it needs to read renegotiation data in order to send.
To use SSL in non-blocking mode, just attempt the operation you want to do. If you want to read decrypted data, call SSL_read. If you want to send encrypted data, call SSL_write. Only call poll if the SSL engine tells you to, with a WANT_READ or WANT_WRITE indication.
Update:: You have a "half of each" hybrid between blocking and non-blocking approaches. This cannot possibly work. The problem is simple: Until you call SSL_read, you don't know whether or not it needs to read from the socket. If you call poll first, you will block even if SSL_read does not need to read from the socket. If you call SSL_read first, it will block if it does need to read from the socket. SSL_pending won't help you. If SSL_read needs to write to the socket to make forward progress, SSL_pending will return zero, but calling poll will block forever.
You have two sane choices:
Blocking. Leave the sockets set blocking. Just call SSL_read when you want to read and SSL_write when you want to write. They will block. Blocking sockets can block, that's how they work.
Non-blocking. Set the sockets non-blocking. Just call SSL_read when you want to read and SSL_write when you want to write. They will not block. If you get a WANT_READ indication, poll in the read direction. If you get a WANT_WRITE indication, poll in the write direction. Note that it is perfectly normal for SSL_read to return WANT_WRITE, and then you poll in the write direction. Similarly, SSL_write can return WANT_READ, and then you poll in the read direction.
Your code would (mostly) work if the implementation of SSL_read was basically, "read some data then decrypt it" and SSL_write was "encrypt some data and send it". The problem is, these functions actually run a sophisticated state machine that reads and writes to the socket as needed and ultimately causes the effect of giving you decrypted data or encrypting your data and sending it.
After spending time debugging my application with OpenSSL, I figured out the answer to the question I originally posted. I am sharing it here in case it helps others like me.
The question I had posted originally had to do with a clear error from OpenSSL indicating that it was receiving application data in the middle of a handshake. What I failed to understand was that OpenSSL gets confused when it receives application data in the middle of a handshake. It's fine to receive handshake data when receiving/sending application data but not the other way around (at least with OpenSSL). That's the thing that I failed to realize. This is also the reason most SSL-enabled applications run fine because most of them are half-duplex in nature (HTTPS for instance) which implicitly guarantees no application data asynchronously arriving at the time of handshake.
What this means is that if you are designing a custom client-server full-duplex protocol (which is the case I am in) and want to slap SSL onto it, it's the application's responsibility to initiate a renegotiation when neither end is sending any data. This is clearly documented in Mozilla's NSS API. Not to mention there is an open ticket in OpenSSL's bug repository regarding this issue. The moment I changed my application to initiate a handshake when there is nothing for the client/server to sayto one another, I no longer faced the above errors.
Also, I agree with David's comments about blocking sockets and I've read many of his arguments in the OpenSSL mailing list as well. But, the sad thing is that most legacy applications are built around poll and blocking sockets and they "Just Work Fine (TM)". The issue arises when dealing with SSL renegotiation. I still believe at least my application can deal with SSL renegotiation in the presence of blocking sockets since it is a very confined and custom protocol and we (as the application developer) can decide to do the renegotiation when the protocol is quiescent. If that doesn't work, I will go the non-blocking socket route.
Related
I have a socket programming situation where the client shuts down the writing end of the socket to let the server know input is finished (via receiving EOF), but keeps the reading end open to read back a result (one line of text). It would be useful for the server to know that the client has successfully read the result and closed the socket (or at least shut down the reading end). Is there a good way to check/wait for such status?
No. All you can know is whether your sends succeeded, and some of them will succeed even after the peer read shutdown, because of TCP buffering.
This is poor design. If the server needs to know that the client received the data, the client needs to acknowledge it, which means it can't shutdown its write end. The client should:
send an in-band termination message, as data.
read and acknowledge all further responses until end of stream occurs.
close the socket.
The server should detect the in-band termination message and:
stop reading requests from the socket
send all outstanding responses and read the acknowledgements
close the socket.
OR, if the objective is only to ensure that client and server end at the same time, each end should shutdown its socket for output and then read input until end of stream occurs, then close the socket. That way the final closes will occur more or less simultaneously on both ends.
getsockopt with TCP_INFO seems the most obvious choice, but it's not cross-platform.
Here's an example for Linux:
import socket
import time
import struct
import pprint
def tcp_info(s):
rv = dict(zip("""
state ca_state retransmits probes backoff options snd_rcv_wscale
rto ato snd_mss rcv_mss unacked sacked lost retrans fackets
last_data_sent last_ack_sent last_data_recv last_ack_recv
pmtu rcv_ssthresh rtt rttvar snd_ssthresh snd_cwnd advmss reordering
rcv_rtt rcv_space
total_retrans
pacing_rate max_pacing_rate bytes_acked bytes_received segs_out segs_in
notsent_bytes min_rtt data_segs_in data_segs_out""".split(),
struct.unpack("BBBBBBBIIIIIIIIIIIIIIIIIIIIIIIILLLLIIIIII",
s.getsockopt(socket.IPPROTO_TCP, socket.TCP_INFO, 160))))
wscale = rv.pop("snd_rcv_wscale")
# bit field layout is up to compiler
# FIXME test the order of nibbles
rv["snd_wscale"] = wscale >> 4
rv["rcv_wscale"] = wscale & 0xf
return rv
for i in range(100):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 7878))
s.recv(10)
pprint.pprint(tcp_info(s))
I doubt a true cross-platform alternative exists.
Fundamentally there are quite a few states:
you wrote data to socket, but it was not sent yet
data was sent, but not received
data was sent and losts (relies on timer)
data was received, but not acknowledged yet
acknowledgement not received yet
acknowledgement lost (relies on timer)
data was received by remote host but not read out by application
data was read out by application, but socket still alive
data was read out, and app crashed
data was read out, and app closed the socket
data was read out, and app called shutdown(WR) (almost same as closed)
FIN was not sent by remote yet
FIN was sent by remote but not received yet
FIN was sent and got lost
FIN received by your end
Obviously your OS can distinguish quite a few of these states, but not all of them. I can't think of an API that would be this verbose...
Some systems allow you to query remaining send buffer space. Perhaps if you did, and socket was already shut down, you'd get a neat error?
Good news is just because socket is shut down, doesn't mean you can't interrogate it. I can get all of TCP_INFO after shutdown, with state=7 (closed). In some cases report state=8 (close wait).
http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1961 has all the gory details of Linux TCP state machine.
TL;DR:
Don't rely on the socket state for this; it can cut you in many error cases. You need to bake the acknowledgement/receipt facility into your communications protocol. First character on each line used for status/ack works really well for text-based protocols.
On many, but not all, Unix-like/POSIXy systems, one can use the TIOCOUTQ (also SIOCOUTQ) ioctl to determine how much data is left in the outgoing buffer.
For TCP sockets, even if the other end has shut down its write side (and therefore will send no more data to this end), all transmissions are acknowledged. The data in the outgoing buffer is only removed when the acknowledgement from the recipient kernel is received. Thus, when there is no more data in the outgoing buffer, we know that the kernel at the other end has received the data.
Unfortunately, this does not mean that the application has received and processed the data. This same limitation applies to all methods that rely on socket state; this is also the reason why fundamentally, the acknowledgement of receipt/acceptance of the final status line must come from the other application, and cannot be automatically detected.
This, in turn, means that neither end can shut down their sending sides before the very final receipt/acknowledge message. You cannot rely on TCP -- or any other protocols' -- automatic socket state management. You must bake in the critical receipts/acknowledgements into the stream protocol itself.
In OP's case, the stream protocol seems to be simple line-based text. This is quite useful and easy to parse. One robust way to "extend" such a protocol is to reserve the first character of each line for the status code (or alternatively, reserve certain one-character lines as acknowledgements).
For large in-flight binary protocols (i.e., protocols where the sender and receiver are not really in sync), it is useful to label each data frame with an increasing (cyclic) integer, and have the other end respond, occasionally, with an update to let the sender know which frames have been completely processed, and which ones received, and whether additional frames should arrive soon/not-very-soon. This is very useful for network-based appliances that consume a lot of data, with the data provider wishing to be kept updated on the progress and desired data rate (think 3D printers, CNC machines, and so on, where the contents of the data changes the maximum acceptable data rate dynamically).
Okay so I recall pulling my hair out trying to solve this very problem back in the late 90's. I finally found an obscure doc that stated that a read call to a disconnected socket will return a 0. I use this fact to this day.
You're probably better off using ZeroMQ. That will send a whole message, or no message at all. If you set it's send buffer length to 1 (the shortest it will go) you can test to see if the send buffer is full. If not, the message was successfully transferred, probably. ZeroMQ is also really nice if you have an unreliable or intermittent network connection as part of your system.
That's still not entirely satisfactory. You're probably even better off implementing your own send acknowledge mechanism on top of ZeroMQ. That way you have absolute proof that a message was received. You don't have proof that a message was not received (something can go wrong between emitting and receiving the ack, and you cannot solve the Two Generals Problem). But that's the best that can be achieved. What you'll have done then is implement a Communicating Sequential Processes architecture on top of ZeroMQ's Actor Model which is itself implemented on top of TCP streams.. Ultimately it's a bit slower, but your application has more certainty of knowing what's gone on.
When reading from a socket using read(2) and blocking I/O, when do I know that the other side (the client) has no more data to send? (by "no more data to send" I mean that, as an example, the client is waiting for a response). At first, I thought that this point is reached when less than count bytes are returned by read (as in read(fd, *buf, count)).
But what if the client sends the data fragmented? Reading until read returns 0 would be a solution, but as far as I know 0 is only returned when the client closes the connection - otherwise, read would just block until the connection is closed. I thought of using non-blocking I/O and a timeout for select(2), but this does not seem to be a tidy solution to me.
Are there any known best practices?
The concept of "the other side has no more data to send", without either a timeout or some semantics in the transmitted data, is quite pointless. Normally, code on the client/server will be able to process data faster than the network can transmit it. So if there's no data in the receive buffer when you're trying to read() it, this just means the network has not yet transmitted everything, but you have no way to tell if the next packet will arrive within a millisecond, a second, or a day. You'd probably consider the first case as "there is more data to send", the third as "no more data to send", and the second depends on your application.
If the other side doesn't close the connection, you probably don't know when it's ready to send the next data packet either.
So unless you have specific semantics and knowledge about what the client sends, using select() and non-blocking I/O is the best you can do.
In specific cases, there might be other ways - for example, if you know the client will send and XML tag, some data, and a closing tag, every n seconds. In that case you could start reading n seconds after the last packet you received, then just read on until you receive the closing tag. But as i said, this isn't a general approach since it requires semantics on the channel.
TCP is a byte-stream protocol, not a message protocol. If you want messages you really have to implement them yourself, e.g. with a length-word prefix, lines, XML, etc. You can guess with the FIONREAD option of ioctl(), but guessing is all it is, as you can't know whether the client has paused in the middle of transmission of the message, or whether the network has done so for some reason.
The protocol needs to give you a way to know when the client is finishes sending a message.
Common approaches are to send the length of each message before it, or to send a special terminator after each message (similar to the NUL character at the end of strings in C).
I have a client-server application where each side communicate with the other via TCP socket.
I properly establish the connection and then I crash the server BEFORE any data is written on the socket by the client.
What I see is that the first write() attempt (client-side) is successful and it returns the actual number of written bytes, while the following ones return (as I expected) -1 (receiving a SIGPIPE) and errno=EPIPE.
Why the first write() is successful even if the socket is already closed?
EDIT
Sometimes also the following write() have a positive return values, as if everything goes well.
You're confused by what the return value of write() means. It doesn't mean, "the peer got the data and acknowledged it". Instead, it means, "I buffered so-many bytes to send to the peer and they're my responsibility now, so you can forget about them (and I don't have any pending errors)".
That is, if the TCP stack accepts the write and returns n bytes, that doesn't mean they've been written yet, just queued for writing. It'll take some time, perhaps 30s after it starts sending network traffic, before the stack gives up and returns an error to you. During that time, you could have done several calls to write() which were successful at queueing data for sending. (The write error will be returned in c.30s if the peer has vanished, or immediately if the peer can be contacted and sends a RST packet straight away to indicate the connection is dead.)
This has to do with how TCP/IP works, that can be roughly described as two mostly independent half-connections. When you close the socket at the server, the client is told that it will not receive further data from the C<-S half-connection, waking up read() immediatly, but not about the C->S direction. It only gets a reply resetting the connection after it tries to send some data. I recommend the TCP/IP Guide for further details.
The reason why sometimes you can write() twice is that you write faster than the round-trip time and can squeeze a second write() before the reply to the first one.
I'm using the following method to detect a disconnected server condition:
After getting the select() timeout on a socket (nothing was received, though was supposed to),
the 'system("ping -c 1 -w 1 server");' command is activated.
If the server is up and just lagging, the ping command will return in less than 0.1 seconds.
Otherwise (the server is down), the ping command will return in 1 second.
i am learning to use SO_SNDTIMEO and SO_RCVTIMEO to check the timeout.
It is easy to use with read socket. But when i want to check write timeout, it always return successful. Here is what i did:(all in blocking mode)
close the client read socket and exit before server start write
terminate the client before server start write
unplug the cable of server after accept but before write
well, it seems all these case write just return sucessfully.
I think the reason should be that port is resource managed by os, and at the client side, after program gone, the tcp connection still shows FIN_WAIT2 state.
so, is there any convenient way to simulate some cases that write can receive errors such as EPIPE, EAGAIN?
How to get the error EAGAIN?
To get the error EAGAIN, you need to be using Non-Blocking Sockets. With Non-Blocking sockets, you need to write huge amounts of data (and stop receiving data on the peer side), so that your internal TCP buffer gets filled and returns this error.
How to get the error EPIPE?
To get the error EPIPE, you need to send large amount of data after closing the socket on the peer side. You can get more info about EPIPE error from this SO Link. I had asked a question about Broken Pipe Error in the link provided and the accepted answer gives a detailed explanation. It is important to note that to get EPIPE error you should have set the flags parameter of send to MSG_NOSIGNAL. Without that, an abnormal send can generate SIGPIPE signal.
Additional Note
Please note that it is difficult to simulate a write failure, as TCP generally stores the data that you are trying to write into it's internal buffer. So, if the internal buffer has sufficient space, then you won't get an error immediately. The best way is to try to write huge amounts of data. You can also try setting a smaller buffer size for send by using setsockopt function with SO_SNDBUF option
You can simulate errors using fault injection. For example, libfiu is a fault injection library that comes with an example project that allows you to simulate errors from POSIX functions. Basically it uses LD_PRELOAD to inject a wrapper around the regular system calls (including write), and then the wrapper can be configured to either pass through to the real system call, or return whatever error you like.
You could set the receive buffer size to be really small on one side, and send a large buffer on the other. Or on the one side set the send buffer small and try to send a large message.
Otherwise the most common test (I think) is to let the server and client talk for a while, and then remove a network cable.
Hi' I'm writing a simple http port forwarder. I read data from port 80, and pass the data to my lighttpd server, on port 8080.
As long as I write() data on the socket on port 8080 (forwarding the request) there's no problem, but when I read() data from that socket (forwarding the response), the last read() hangs a lot (about 1 or 2 seconds) before realizing there's no more data and returning 0.
I tried to set the socket to non-blocking, but this doesn't work, as sometimes it returns EWOULDBLOCKING even if there's some data left (lighttpd + cgi can be quite slow).
I tried to set a timeout with select(), but, as above, a slow cgi could timeout the socket when there's actually some data to transmit.
Update: SOLVED. It was the keepalive after all. After I disabled it in my lighttpd configuration file, the whole thing runs flawlessly.
Well, for the sake of completion, and as per my comment:
It is likely that the HTTP server itself (lighttpd in your case) is maintaining a persistent connection to your proxy because your proxy relayed a header containing “Connection: keep-alive”. This header aids when the client wants to make multiple requests over the same connection. So, because lighttpd received this header, it assumed it was going to receive further requests and kept the socket open, causing read to block in your proxy.
Disabling keep-alive in your lighttpd configuration is one way to fix it, but also you could also strip the “Connection: keep-alive“ from the header before you relay it to your web server.
Using both non-blocking sockets and select is the right way to go. Returning EWLOULDBLOCK doesn't mean that the entire stream of data is finished being received, it means that, instantaneously, there is nothing to read right now. That's exactly what you want, because it means that read won't wait even half a second for more data to show up. If the data isn't immediately available it will return.
Now, obviously, this means you will need to call read multiple times to get the complete data. The general format for doing this is a select loop. In pseudocode:
do
select ( my_sockets )
if ( select error )
handle_error
else
for each ( socket in my_sockets ) do
if ( socket is ready ) then
nonblocking read from socket
if ( no data was read ) then
close socket
remove socket from my_sockets
endif
endif
loop
endif
loop
The idea is that select will tell you which sockets have data available for reading right now. If you read one of those sockets, you are guaranteed either to get data or to get a return value of 0, indicating that the remote end closed the socket.
If you use this method, you will never be stuck in a read call that is not reading data, for any length of time. The blocking operation is the select call, and you can also select over writeable sockets if you need to write, and set a timeout if you need to do things periodically.
Don't do that!
Keepalives boost performance from other clients. Instead, fix your client. Send a Connection: close header in your client and make sure your request doesn't claim HTTP/1.1 compliance. (If for no other reason than that you probably don't handle chunked encoding either.)
I guess that I would use non-blocking I/O to full extend. Instead of setting timeouts I'd rather wait for event's:
while(select(...)) {
switch(...) {
case ...: // Handle accepting new connection
case ...: // Handle reading from socket
...
}
}
Sinle-thread, blocking forwarder will cause problems anyway with multiple clients.
Sorry - I don't remember exact calls. Also it can be strange in some cases (IIRC - you need to handle write), but there are libraries which simplify the task.