c select() reading until null character - c

I am implementing a proxy in c and am using select() to not block on I/O. There are multiple clients connecting to the proxy, so I include the socket descriptor # in my messages so that I know to which socket to forward a reply message from the server.
However, sometimes read() will not receive the full message up to the null character, but will send the rest of the message on the next round of select(). I would like to receive the full message at once so that I will know which socket to forward the reply to (buffering will not work, since I don't know which message belongs to which when there are multiple clients). Is there a way to do this without blocking on read while waiting for a null character to arrive?

There is no such thing as a message in TCP. It is a byte stream protocol. You write bytes, it sends bytes, you read bytes. There is no guarantee how many bytes you will receive at any one time and there is no guaranteed association between the amount of data written by a single write and read by a single read. If you want messages you must implement them yourself. Any given read may read zero, one, or more bytes, up to the length of the buffer. It might be half a message. It might be one and a half messages. What it is is entirely up to you.

Use ZeroMQ if you're doing individual messages. It has bindings for a huge number of languages and is a great abstraction for networking. In fact, it can handle this proxy model for you.

Related

Determine if peer has closed reading end of socket

I have a socket programming situation where the client shuts down the writing end of the socket to let the server know input is finished (via receiving EOF), but keeps the reading end open to read back a result (one line of text). It would be useful for the server to know that the client has successfully read the result and closed the socket (or at least shut down the reading end). Is there a good way to check/wait for such status?
No. All you can know is whether your sends succeeded, and some of them will succeed even after the peer read shutdown, because of TCP buffering.
This is poor design. If the server needs to know that the client received the data, the client needs to acknowledge it, which means it can't shutdown its write end. The client should:
send an in-band termination message, as data.
read and acknowledge all further responses until end of stream occurs.
close the socket.
The server should detect the in-band termination message and:
stop reading requests from the socket
send all outstanding responses and read the acknowledgements
close the socket.
OR, if the objective is only to ensure that client and server end at the same time, each end should shutdown its socket for output and then read input until end of stream occurs, then close the socket. That way the final closes will occur more or less simultaneously on both ends.
getsockopt with TCP_INFO seems the most obvious choice, but it's not cross-platform.
Here's an example for Linux:
import socket
import time
import struct
import pprint
def tcp_info(s):
rv = dict(zip("""
state ca_state retransmits probes backoff options snd_rcv_wscale
rto ato snd_mss rcv_mss unacked sacked lost retrans fackets
last_data_sent last_ack_sent last_data_recv last_ack_recv
pmtu rcv_ssthresh rtt rttvar snd_ssthresh snd_cwnd advmss reordering
rcv_rtt rcv_space
total_retrans
pacing_rate max_pacing_rate bytes_acked bytes_received segs_out segs_in
notsent_bytes min_rtt data_segs_in data_segs_out""".split(),
struct.unpack("BBBBBBBIIIIIIIIIIIIIIIIIIIIIIIILLLLIIIIII",
s.getsockopt(socket.IPPROTO_TCP, socket.TCP_INFO, 160))))
wscale = rv.pop("snd_rcv_wscale")
# bit field layout is up to compiler
# FIXME test the order of nibbles
rv["snd_wscale"] = wscale >> 4
rv["rcv_wscale"] = wscale & 0xf
return rv
for i in range(100):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 7878))
s.recv(10)
pprint.pprint(tcp_info(s))
I doubt a true cross-platform alternative exists.
Fundamentally there are quite a few states:
you wrote data to socket, but it was not sent yet
data was sent, but not received
data was sent and losts (relies on timer)
data was received, but not acknowledged yet
acknowledgement not received yet
acknowledgement lost (relies on timer)
data was received by remote host but not read out by application
data was read out by application, but socket still alive
data was read out, and app crashed
data was read out, and app closed the socket
data was read out, and app called shutdown(WR) (almost same as closed)
FIN was not sent by remote yet
FIN was sent by remote but not received yet
FIN was sent and got lost
FIN received by your end
Obviously your OS can distinguish quite a few of these states, but not all of them. I can't think of an API that would be this verbose...
Some systems allow you to query remaining send buffer space. Perhaps if you did, and socket was already shut down, you'd get a neat error?
Good news is just because socket is shut down, doesn't mean you can't interrogate it. I can get all of TCP_INFO after shutdown, with state=7 (closed). In some cases report state=8 (close wait).
http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1961 has all the gory details of Linux TCP state machine.
TL;DR:
Don't rely on the socket state for this; it can cut you in many error cases. You need to bake the acknowledgement/receipt facility into your communications protocol. First character on each line used for status/ack works really well for text-based protocols.
On many, but not all, Unix-like/POSIXy systems, one can use the TIOCOUTQ (also SIOCOUTQ) ioctl to determine how much data is left in the outgoing buffer.
For TCP sockets, even if the other end has shut down its write side (and therefore will send no more data to this end), all transmissions are acknowledged. The data in the outgoing buffer is only removed when the acknowledgement from the recipient kernel is received. Thus, when there is no more data in the outgoing buffer, we know that the kernel at the other end has received the data.
Unfortunately, this does not mean that the application has received and processed the data. This same limitation applies to all methods that rely on socket state; this is also the reason why fundamentally, the acknowledgement of receipt/acceptance of the final status line must come from the other application, and cannot be automatically detected.
This, in turn, means that neither end can shut down their sending sides before the very final receipt/acknowledge message. You cannot rely on TCP -- or any other protocols' -- automatic socket state management. You must bake in the critical receipts/acknowledgements into the stream protocol itself.
In OP's case, the stream protocol seems to be simple line-based text. This is quite useful and easy to parse. One robust way to "extend" such a protocol is to reserve the first character of each line for the status code (or alternatively, reserve certain one-character lines as acknowledgements).
For large in-flight binary protocols (i.e., protocols where the sender and receiver are not really in sync), it is useful to label each data frame with an increasing (cyclic) integer, and have the other end respond, occasionally, with an update to let the sender know which frames have been completely processed, and which ones received, and whether additional frames should arrive soon/not-very-soon. This is very useful for network-based appliances that consume a lot of data, with the data provider wishing to be kept updated on the progress and desired data rate (think 3D printers, CNC machines, and so on, where the contents of the data changes the maximum acceptable data rate dynamically).
Okay so I recall pulling my hair out trying to solve this very problem back in the late 90's. I finally found an obscure doc that stated that a read call to a disconnected socket will return a 0. I use this fact to this day.
You're probably better off using ZeroMQ. That will send a whole message, or no message at all. If you set it's send buffer length to 1 (the shortest it will go) you can test to see if the send buffer is full. If not, the message was successfully transferred, probably. ZeroMQ is also really nice if you have an unreliable or intermittent network connection as part of your system.
That's still not entirely satisfactory. You're probably even better off implementing your own send acknowledge mechanism on top of ZeroMQ. That way you have absolute proof that a message was received. You don't have proof that a message was not received (something can go wrong between emitting and receiving the ack, and you cannot solve the Two Generals Problem). But that's the best that can be achieved. What you'll have done then is implement a Communicating Sequential Processes architecture on top of ZeroMQ's Actor Model which is itself implemented on top of TCP streams.. Ultimately it's a bit slower, but your application has more certainty of knowing what's gone on.

Will read (socket) block until the buffer is full?

I wrote a simple C socket program that sends an INIT package to the server to indicate to prepare a text transfer. The server does not sends any data back at that time.
After sending the INIT package the client sends a GET package and waits for chunks of data from the server.
So every time the server receives a GET package it will send a chunk of data to the client.
So far so good. The buffer has a size of 512 bytes, a chunk is 100 Bytes plus a little overhead big.
But my problem is that the client does not receive the second message.
So my guess is that read() will blpck until the buffer is full. Is that right or what might be the reason for that?
It depends. For TCP sockets read may return before the buffer is full, and you may need to receive in a loop to get a whole message. For UDP sockets the size you read is typically the size of a single packet (datagram) and then read may block until it has read all the requested data.
The answer is no: read() on a tcp/ip socket will not block until the buffer has the amount of data you requested. read() will return immediately in all cases if any data is available, even if your socket is blocking and you've requested more data than is available.
Keep in mind that TCP/IP is a byte stream protocol and you must treat it as such. The interface is under no obligation to transmit your data together in a single packet, as long as it is presented to you in the order you placed it in the socket.
The answer is no , read is not blocking call , You can refer below points to guess the error
Several Checkpoints you can find :
Find out what read is returning at the second time .
memset the buffer every time in while before recv
use fflush(stdout) if not able to output.
Make sure all three are present . if problem not solved yet .please post source code here

Reading all available bytes via socket using blocking I/O

When reading from a socket using read(2) and blocking I/O, when do I know that the other side (the client) has no more data to send? (by "no more data to send" I mean that, as an example, the client is waiting for a response). At first, I thought that this point is reached when less than count bytes are returned by read (as in read(fd, *buf, count)).
But what if the client sends the data fragmented? Reading until read returns 0 would be a solution, but as far as I know 0 is only returned when the client closes the connection - otherwise, read would just block until the connection is closed. I thought of using non-blocking I/O and a timeout for select(2), but this does not seem to be a tidy solution to me.
Are there any known best practices?
The concept of "the other side has no more data to send", without either a timeout or some semantics in the transmitted data, is quite pointless. Normally, code on the client/server will be able to process data faster than the network can transmit it. So if there's no data in the receive buffer when you're trying to read() it, this just means the network has not yet transmitted everything, but you have no way to tell if the next packet will arrive within a millisecond, a second, or a day. You'd probably consider the first case as "there is more data to send", the third as "no more data to send", and the second depends on your application.
If the other side doesn't close the connection, you probably don't know when it's ready to send the next data packet either.
So unless you have specific semantics and knowledge about what the client sends, using select() and non-blocking I/O is the best you can do.
In specific cases, there might be other ways - for example, if you know the client will send and XML tag, some data, and a closing tag, every n seconds. In that case you could start reading n seconds after the last packet you received, then just read on until you receive the closing tag. But as i said, this isn't a general approach since it requires semantics on the channel.
TCP is a byte-stream protocol, not a message protocol. If you want messages you really have to implement them yourself, e.g. with a length-word prefix, lines, XML, etc. You can guess with the FIONREAD option of ioctl(), but guessing is all it is, as you can't know whether the client has paused in the middle of transmission of the message, or whether the network has done so for some reason.
The protocol needs to give you a way to know when the client is finishes sending a message.
Common approaches are to send the length of each message before it, or to send a special terminator after each message (similar to the NUL character at the end of strings in C).

writing data to a socket that is sent in 2 frames

My appliactions sends through the wire using socket small messages. Each message is around 200 bytes of data. I would like to see my data sent in 2 frames instead of 1. My questions are
How to do that i.e. is there a way to cause TCP to automatically split the buffer in 2 frames?
Do I get the same if I send my buffer in 2 separate writes?
I am using Linux and C.
How to do that i.e. is there a way to cause TCP to automatically split
the buffer in 2 frames?
TCP is a stream communication protocol, all data is continuous. You should split your data by delimiters.
For example, in HTTP protocol each separated request is splited by two \n.
Do I get the same if I send my buffer in 2 separate writes?
No, you will receive them as a one continuous data stream. Frames are meaningless.
Note: Before you receive any data TCP in your application, packets are separated but OS collect and reassemble them. This process is transparent from your application.
Here are a few things you can consider.
TCP does have the PSH flag, that you can set in a packet, that makes TCP push out any buffered data. But this will work somewhat unreliably, because, in theory, data can get combined again on the receiving side. But in practice, you will see the data being delivered separately.
You can't really use "\n" as a delimiter, because it can occur naturally in your data. You have to come up with some kind of a escape sequence to use, and escape all the occurrences of "\n" in the data. This can be painful.
If you need message boundaries, consider a protocol that supports it. Like UDP. But with UDP you lose guaranteed delivery. You will have to roll your own confirmations, retries and what not.
Finally there is SCTP. Less used protocol, but available in the Linux stack at least. It gives you best of both worlds. Message boundaries, guaranteed delivery, guaranteed sequence.

can one call of recv() receive data from 2 consecutive send() calls?

i have a client which sends data to a server with 2 consecutive send calls:
send(_sockfd,msg,150,0);
send(_sockfd,msg,150,0);
and the server is receiving when the first send call was sent (let's say i'm using select):
recv(_sockfd,buf,700,0);
note that the buffer i'm receiving is much bigger.
my question is: is there any chance that buf will contain both msgs? of do i need 2 recv() calls to get both msgs?
thank you!
TCP is a stream oriented protocol. Not message / record / chunk oriented. That is, all that is guaranteed is that if you send a stream, the bytes will get to the other side in the order you sent them. There is no provision made by RFC 793 or any other document about the number of segments / packets involved.
This is in stark contrast with UDP. As #R.. correctly said, in UDP an entire message is sent in one operation (notice the change in terminology: message). Try to send a giant message (several times larger than the MTU) with TCP ? It's okay, it will split it for you.
When running on local networks or on localhost you will certainly notice that (generally) one send == one recv. Don't assume that. There are factors that change it dramatically. Among these
Nagle
Underlying MTU
Memory usage (possibly)
Timers
Many others
Of course, not having a correspondence between an a send and a recv is a nuisance and you can't rely on UDP. That is one of the reasons for SCTP. SCTP is a really really interesting protocol and it is message-oriented.
Back to TCP, this is a common nuisance. An equally common solution is this:
Establish that all packets begin with a fixed-length sequence (say 32 bytes)
These 32 bytes contain (possibly among other things) the size of the message that follows
When you read any amount of data from the socket, add the data to a buffer specific for that connection. When 32 bytes are reached, read the length you still need to read until you get the message.
It is really important to notice how there are really no messages on the wire, only bytes. Once you understand it you will have made a giant leap towards writing network applications.
The answer depends on the socket type, but in general, yes it's possible. For TCP it's the norm. For UDP I believe it cannot happen, but I'm not an expert on network protocols/programming.
Yes, it can and often does. There is no way of matching up sends and receive calls when using TCP/IP. Your program logic should test the return values of both send and recv calls in a loop, which terminates when everything has been sent or recieved.

Resources