C - sockets - epoll. What with slow clients? - c

Context
Debian64bits.
Thought I understood socket implications but not.
Worried about the management of slow clients.
Read and fiddled with that code epoll edge triggered
Imagine two clients:
A: very slow network
B: very fast network.
Question
In edge triggered mode what would happen when we start reading from A and then B also sends some datas to read while we are in the read loop for A?
Will B have to wait for us to read all the datas coming from A?
Or should we create a buffer (not in the example given) to store A datas and concatenate those until the message is complete and return immediately?
Or is that buffer already managed by epoll?
The underlying is not clear to me..

It doesn't matter if the network is slow, the reading part will stop whenever it has read all data that has reached the server so far.
There is a bug (or at least a portability issue) in it though. The code only checks for EAGAIN when reading data. It should also check for EWOULDBLOCK, since some platforms return that error code when all data has been read.
Edit:
Ok, so if i want to put the incoming datas into a database, how could i manage it? Will i be able to read the whole datas?
If you get an incomplete message you would need to store it until the entire message has been received. I would malloc a char* for each incoming connection with a message fragment.
When a message is complete, you can process it, store it in the database or whatever you desire.
Note that the beginning of the next message may already be available, so you can't stop reading until you get EWOULDBLOCK or EAGAIN.
Another way to treat the problem is to create a separate thread for each connection.
Which approcah is best? Well, that would depend on the application (number of simultaneous users etc).

Related

Determine if peer has closed reading end of socket

I have a socket programming situation where the client shuts down the writing end of the socket to let the server know input is finished (via receiving EOF), but keeps the reading end open to read back a result (one line of text). It would be useful for the server to know that the client has successfully read the result and closed the socket (or at least shut down the reading end). Is there a good way to check/wait for such status?
No. All you can know is whether your sends succeeded, and some of them will succeed even after the peer read shutdown, because of TCP buffering.
This is poor design. If the server needs to know that the client received the data, the client needs to acknowledge it, which means it can't shutdown its write end. The client should:
send an in-band termination message, as data.
read and acknowledge all further responses until end of stream occurs.
close the socket.
The server should detect the in-band termination message and:
stop reading requests from the socket
send all outstanding responses and read the acknowledgements
close the socket.
OR, if the objective is only to ensure that client and server end at the same time, each end should shutdown its socket for output and then read input until end of stream occurs, then close the socket. That way the final closes will occur more or less simultaneously on both ends.
getsockopt with TCP_INFO seems the most obvious choice, but it's not cross-platform.
Here's an example for Linux:
import socket
import time
import struct
import pprint
def tcp_info(s):
rv = dict(zip("""
state ca_state retransmits probes backoff options snd_rcv_wscale
rto ato snd_mss rcv_mss unacked sacked lost retrans fackets
last_data_sent last_ack_sent last_data_recv last_ack_recv
pmtu rcv_ssthresh rtt rttvar snd_ssthresh snd_cwnd advmss reordering
rcv_rtt rcv_space
total_retrans
pacing_rate max_pacing_rate bytes_acked bytes_received segs_out segs_in
notsent_bytes min_rtt data_segs_in data_segs_out""".split(),
struct.unpack("BBBBBBBIIIIIIIIIIIIIIIIIIIIIIIILLLLIIIIII",
s.getsockopt(socket.IPPROTO_TCP, socket.TCP_INFO, 160))))
wscale = rv.pop("snd_rcv_wscale")
# bit field layout is up to compiler
# FIXME test the order of nibbles
rv["snd_wscale"] = wscale >> 4
rv["rcv_wscale"] = wscale & 0xf
return rv
for i in range(100):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 7878))
s.recv(10)
pprint.pprint(tcp_info(s))
I doubt a true cross-platform alternative exists.
Fundamentally there are quite a few states:
you wrote data to socket, but it was not sent yet
data was sent, but not received
data was sent and losts (relies on timer)
data was received, but not acknowledged yet
acknowledgement not received yet
acknowledgement lost (relies on timer)
data was received by remote host but not read out by application
data was read out by application, but socket still alive
data was read out, and app crashed
data was read out, and app closed the socket
data was read out, and app called shutdown(WR) (almost same as closed)
FIN was not sent by remote yet
FIN was sent by remote but not received yet
FIN was sent and got lost
FIN received by your end
Obviously your OS can distinguish quite a few of these states, but not all of them. I can't think of an API that would be this verbose...
Some systems allow you to query remaining send buffer space. Perhaps if you did, and socket was already shut down, you'd get a neat error?
Good news is just because socket is shut down, doesn't mean you can't interrogate it. I can get all of TCP_INFO after shutdown, with state=7 (closed). In some cases report state=8 (close wait).
http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1961 has all the gory details of Linux TCP state machine.
TL;DR:
Don't rely on the socket state for this; it can cut you in many error cases. You need to bake the acknowledgement/receipt facility into your communications protocol. First character on each line used for status/ack works really well for text-based protocols.
On many, but not all, Unix-like/POSIXy systems, one can use the TIOCOUTQ (also SIOCOUTQ) ioctl to determine how much data is left in the outgoing buffer.
For TCP sockets, even if the other end has shut down its write side (and therefore will send no more data to this end), all transmissions are acknowledged. The data in the outgoing buffer is only removed when the acknowledgement from the recipient kernel is received. Thus, when there is no more data in the outgoing buffer, we know that the kernel at the other end has received the data.
Unfortunately, this does not mean that the application has received and processed the data. This same limitation applies to all methods that rely on socket state; this is also the reason why fundamentally, the acknowledgement of receipt/acceptance of the final status line must come from the other application, and cannot be automatically detected.
This, in turn, means that neither end can shut down their sending sides before the very final receipt/acknowledge message. You cannot rely on TCP -- or any other protocols' -- automatic socket state management. You must bake in the critical receipts/acknowledgements into the stream protocol itself.
In OP's case, the stream protocol seems to be simple line-based text. This is quite useful and easy to parse. One robust way to "extend" such a protocol is to reserve the first character of each line for the status code (or alternatively, reserve certain one-character lines as acknowledgements).
For large in-flight binary protocols (i.e., protocols where the sender and receiver are not really in sync), it is useful to label each data frame with an increasing (cyclic) integer, and have the other end respond, occasionally, with an update to let the sender know which frames have been completely processed, and which ones received, and whether additional frames should arrive soon/not-very-soon. This is very useful for network-based appliances that consume a lot of data, with the data provider wishing to be kept updated on the progress and desired data rate (think 3D printers, CNC machines, and so on, where the contents of the data changes the maximum acceptable data rate dynamically).
Okay so I recall pulling my hair out trying to solve this very problem back in the late 90's. I finally found an obscure doc that stated that a read call to a disconnected socket will return a 0. I use this fact to this day.
You're probably better off using ZeroMQ. That will send a whole message, or no message at all. If you set it's send buffer length to 1 (the shortest it will go) you can test to see if the send buffer is full. If not, the message was successfully transferred, probably. ZeroMQ is also really nice if you have an unreliable or intermittent network connection as part of your system.
That's still not entirely satisfactory. You're probably even better off implementing your own send acknowledge mechanism on top of ZeroMQ. That way you have absolute proof that a message was received. You don't have proof that a message was not received (something can go wrong between emitting and receiving the ack, and you cannot solve the Two Generals Problem). But that's the best that can be achieved. What you'll have done then is implement a Communicating Sequential Processes architecture on top of ZeroMQ's Actor Model which is itself implemented on top of TCP streams.. Ultimately it's a bit slower, but your application has more certainty of knowing what's gone on.

Reading all available bytes via socket using blocking I/O

When reading from a socket using read(2) and blocking I/O, when do I know that the other side (the client) has no more data to send? (by "no more data to send" I mean that, as an example, the client is waiting for a response). At first, I thought that this point is reached when less than count bytes are returned by read (as in read(fd, *buf, count)).
But what if the client sends the data fragmented? Reading until read returns 0 would be a solution, but as far as I know 0 is only returned when the client closes the connection - otherwise, read would just block until the connection is closed. I thought of using non-blocking I/O and a timeout for select(2), but this does not seem to be a tidy solution to me.
Are there any known best practices?
The concept of "the other side has no more data to send", without either a timeout or some semantics in the transmitted data, is quite pointless. Normally, code on the client/server will be able to process data faster than the network can transmit it. So if there's no data in the receive buffer when you're trying to read() it, this just means the network has not yet transmitted everything, but you have no way to tell if the next packet will arrive within a millisecond, a second, or a day. You'd probably consider the first case as "there is more data to send", the third as "no more data to send", and the second depends on your application.
If the other side doesn't close the connection, you probably don't know when it's ready to send the next data packet either.
So unless you have specific semantics and knowledge about what the client sends, using select() and non-blocking I/O is the best you can do.
In specific cases, there might be other ways - for example, if you know the client will send and XML tag, some data, and a closing tag, every n seconds. In that case you could start reading n seconds after the last packet you received, then just read on until you receive the closing tag. But as i said, this isn't a general approach since it requires semantics on the channel.
TCP is a byte-stream protocol, not a message protocol. If you want messages you really have to implement them yourself, e.g. with a length-word prefix, lines, XML, etc. You can guess with the FIONREAD option of ioctl(), but guessing is all it is, as you can't know whether the client has paused in the middle of transmission of the message, or whether the network has done so for some reason.
The protocol needs to give you a way to know when the client is finishes sending a message.
Common approaches are to send the length of each message before it, or to send a special terminator after each message (similar to the NUL character at the end of strings in C).

Can i use select to send data on multiple interfaces as fast as the interface can process

I am an experienced network programmer and am faced with a situation where i need some advice.
I am required to distribute some data on several outgoing interfaces (via different tcp socket connections, each corresponding to each interface). However, the important part is, i should be able to send MORE/most of the data on the interface with better bandwidth i.e. the one that can send faster.
The opinion i had was to use select api (both unix and windows) for this purpose. I have used select, poll or even epoll in the past. But it was always for READING from multiple sockets whenever data is available.
Here i intend to write successive packets on several interfaces in sequence, then monitor each of them for write descriptors (select parameter), then which ever is available (means it was able to send the packet first), i would keep sending more packets via that descriptor.
Will i be able to achieve my intension here? i.e. if i have an interface with 10Mbps link vs another one with 1Mbps, i hope to be able to get most of the packets out via the faster interface.
Update 1: I was wondering what would be select's behavior in this case, i.e. when you call select on read descriptors, the one on which data is available is returned. However, in my scenario when we are writing on the descriptors and waiting for select to return the one that finished writing first, does select ensure returning only when entire packet is written i.e. say i tried writing 1200 bytes in one go. Will it only return when entire 1200 are return or there is a permanent error? I am not sure how would select behave and failed to find any documentation describing that.
I'd adapt the producer/consumer pattern. In this case one producer and several consumers.
Let the main thread handle your source (be the producer) and spawn off one thread for each connection (being the consumers).
The treads in parallel pull a chunk of the source each and send it over the connection one by one.
The thread holding the fastest connection is expected to send the most chunks in this setup.
Using poll/epoll/select for writing is rather tricky. The reason is that sockets are mostly ready for writing unless their socket send buffer is full. So, polling for 'writable' is apt to just spin without ever waiting.
You need to proceed as follows:
When you have something to write to a socket, write it, in a loop that terminates when all the data has been written or write() returns -1 with errno == EAGAIN/EWOULDBLOCK.
At that point you have a full socket send buffer. So, you need to register this socket with the selector/poll/epoll for writability.
When you have nothing else to do, select/poll/epoll and repeat the writes that caused the associated sockets to be polled for writability.
Do those writes the same way as at (1) but this time, if the write completes, deregister the socket for writability.
In other words you must only select/poll for writeability if you already know the socket's send buffer is full, and you must stop doing so immediately you know it isn't.
How you fit all this into your application is another question.

Efficient way of validating UDP communication protocol having single Server and multiple Clients

I have developed a single Server/multiple Clients udp application, where Server can handle x number of clients at a time. The Server has x number of threads each thread dedicated to one Client.
The code works perfectly fine. Now I want to check my application for all possible scenarios i.e. validate my application. For this purpose, I need to design a test best.
Initial Design:
The test bed I initially designed has following functionalities:
The Server GUI has a button on it. When the button is clicked, the
each thread in the Server reads a text file, picks up few bytes of
the text file, and sends those chunks to its respective clients. The
thread then picks next chunk of bytes from the text file, sends those
chunks to the client and so on until EOF is found.
The Client on the other side keep receiving these chunks of bytes,
creates a text file, and keeps storing these chunks of bytes in its
text file.
When EOF from Server is received, the Client starts sending the
completely received text file back to the Server over its Socket.
When the file is completely received back (echoed), the Server then
compares the two text files, the Sent file and the echoed one. If
both files are same, the communication process has occurred without
any fault and the communication protocol is validated.
The above mentioned validation technique (sending the text file, receiving the echoed file and then comparing both) checks the following things:
The number of bytes sent = number of bytes receieved.
No data is corrupted.
The data is receieved in proper order.
If any of the above mentioned three conditions is not fulfilled, that means that there is some error in communication.
Now I have been asked to make changes to this test bead and add more functionlities to it. Does the procedure that I am using actually can check above mentioned 3 conditions in all scenarios?
Are there some other conditions that must be checked besides above mentioned 3 conditions.
What could be other methods of checking communication protocol except the one I desgined i.e. Sending a text file and getting it echoed and then comparing.
I have to implement more functionlities to his test bed for making validation system more efficient or completely replece the above test bed with some better option.
Please help me with your suggestions.
Thanks in advane :)
The first two of your conditions are guaranteed by UDP. Picking "a few bytes", i.e. anything less than 65535 bytes (64kiB isn't really a "few" bytes) will result in a single datagram being sent, and anything larger than that will fail. Though you will not want to max out the largest possible datagram size, as it will incur IP fragmentation (staying below 1280 bytes is a good idea).
You will be able to receive exactly the amount you sent or nothing at all, never more or less. UDP does not guarantee that any datagram that is sent out arrives (it cannot guarantee that, since IP does not), but it does guarantee that the entire datagram arrives as-is -- or nothing. Never anything in between.
It further guarantees that the data inside the datagram matches its checksum (the underlying protocols including IP/ethernet/ATM further do their own checksumming) and thus arrives in the same binary representation as it was sent. In other words, data arrives in order (inside the datagram) and is not corrupted.
It is of course in theory possible that a bit error passes all 3 layers of checksums, but this is extremely unlikely and will not happen in practice. Unless you need to guard against someone maliciously tampering with packets, you do not need to worry. The kinds of bit errors that happen accidentially are reliably picked up by the checksums used in the protocols.
If, on the other hand, you do need to guard against malicious modification of your data, you must add a MAC (or a checksum and encrypt the entire packet -- adding a checksum alone is useless).
To ensure that data spanning several datagrams arrives in order, you must add sequence numbers to your packets (in the same manner TCP does). And with that, you can as well use TCP, which is likely more efficient and less error-prone. One of the main reasons why one would want to use UDP is normally because in-order delivery and reliability are not needed, or sometimes reliability is needed, but not in-order delivery.
In-order delivery is the main cause of TCP's latency during packet loss (in absence of packet loss, TCP is exactly as "fast" as UDP), so if this is needed, there is no sane reason not to use TCP in the first place. It is a protocol that has been fine-tuned and worked reliably for literally billions of people for 4 decades.
Also, using one socket and one thread per client is possibly not the best approach. The disk won't read any faster, and the network card won't send any faster either. UDP doesn't need a socket per client either. When using TCP, you'll have no other choice but to use one socket per client, but still multiplexing using a readiness notification system will give you much better performance and fewer opportunities for threading errors.
Also, sending back a checksum such as one of the SHA family (or a MAC, if it needs to be secure) may be more efficient than echoing back the whole lot of data. The likelihood that the checksum matches and the data accidentially doesn't is neglegible.
Entire revision control systems that manage millions of lines of code for millions of people (such as git) rely on the fact that this just doesn't happen to identify files (well, it does happen of course, you just won't live to see it).
I have a question here ? Why UDP why not TCP? especially when you are worried for packet order and data corruption. According to me(I may be wrong), UDP is good only when the data is timesensitive like video stream.
Secondly, yes there are other methods of checking integrity of transmitted data. Simplest may be checking the MD5 and SHA1 checksum.
Does the procedure that I am using actually can check above mentioned 3 conditions in all scenarios?
yes
What could be other methods of checking communication protocol except the one I desgined i.e. Sending a text file and getting it echoed and then comparing.
It doesn't have to be a file, but it has to be something you can check once you get the response. You could just generate some random data and hold on to it until you get the response.
You'd have to tell us what you really want to test. If you are trying to make sure that UDP doesn't give you bad data or out of order data, you're using the wrong protocol. You're not testing anything by seeing if you get the exact data in the exact order you send it over UDP except for the networking infrastructure you have in place.
You say you want to test your application for "all possible scenarios", but that doesn't even mean anything. You're testing to see if a behavior that is part of the UDP specification exists and trying to see that it doesn't? Well, it does. Even if you never see it.

Socket client/server input/output polling vs. read/write in linux

Basically I set up a test to see which method is the fastest way to get data from another computer my network for a server with only a few clients(10 at max, 1 at min).
I tried two methods, both were done in a thread/per client fashion, and looped the read 10000 times. I timed the loop from the creation of the threads to the joining of the threads right after. In my threads I used these two methods, both used standard read(2)/write(2) calls and SOCK_STREAM/AF_INET:
In one I polled for data in my client reading(non blocking) whenever data was available, and in my server, I instantly sent data whenever I got a connection. My thread returned on a read of the correct number of bytes(which happened every time).
In the other, my client sent a message to the sever on connect and my server sent a message to my client on a read(both sides blocked here to make this more turn-based and synchronous). My thread returned after my client read.
I was pretty sure polling would be faster. I made a histogram of times to complete threads, and, as expected, polling was faster by a slight margin, but two things were not expected about the read/write method. Firstly, the read/write method gave me two distinct time spikes. I.E. some event sometimes occurred which would slow the read/write down by about .01 microseconds. I ran this test on a switch initially, and thought this may be a collision of packets, but then I ran the server and client on the same computer and still got these two different time spikes. Anyone know what event may be occurring?
The other, my read function returned too many bytes sometimes, and some bytes were garbage. I know streams don't guarantee you'll get all the data correctly, but why would the read function return extra garbage bytes?
Seems you are confusing the purpose of these two alternatives:
Connection per thread approach does not need polling (unless your protocol allows for random sequence of messages either way, which would be very confusing to implement). Blocking reads and writes will always be faster here since you skip one extra system call to select(2)/poll(2)/epoll(4).
Polling approach allows to multiplex I/O on many sockets/files in single-threaded or fixed-number-of-threads setup. This is how web-servers like nginx handle thousands of client connections in very few threads. The idea is that wait on any given file descriptor does not block others - wait on all of them.
So I would say you are comparing apples and goblins :) Take a look here:
High Performance Server Architecture
The C10K problem
libevent
As for the spikes - check if TCP gets into re-transmission mode, i.e. one of the sides is not reading fast enough to drain receive buffers, play with SO_RCVBUF and SO_SNDBUF socket options.
Too many bytes is definitely wrong - looks like API misuse - check if you are comparing signed and unsigned numbers, compile with high warning level.
Edit:
Looks like you are between two separate issues - data corruption and data transfer performance. I would strongly recommend focusing on the first one before tackling the second. Reduce the test to a minimum and try to figure out what you are doing wrong with the sockets. i.e. where's that garbage data comes from. Do you check return values of the read(2) and write(2) calls? Do you share buffers between threads? Paste the reduced code sample into the question (or provide a link to it) if really stuck.
Hope this helps.
I know streams don't guarantee you'll get all the data correctly, but why would the read function return extra garbage bytes?
Actually, streams do guarantee you will get all the data correctly, and in order. Datagrams (UDP) are what you were thinking of, SOCK_DGRAM, which is not what you are using. Within AF_INET, SOCK_STREAM means TCP and TCP means reliable.

Resources