Buffering of stream data - c

I'm trying to develop a simple IRC bot. First I want to think out a proper design for this project. One of the things I'm wondering about right now is the read mechanism. I develop this bot on a Linux system. (Fedora 12) To read from a socket I use the system call "read()". I plan to use the reading functionality in the following way (code just an example. Not something from the final product):
while (uBytesRead = read(iServerSocket, caBuffer, MAX_MESSAGE_SIZE))
{
//1. Parse the buffer and place it into a Message structure.
//2. Add the message structure to a linked list that will act as a queue of message that are to be processed.
}
This code will be run in it's own thread. I choose for this option because I wanted there to be as small of a delay between reads as possible. (writes will be implemented in the same way) This is all slightly based on assumptions, that I would like to clear up. My question is: what if you receive so much data at such a quick rate, that the reading and processing the data (in this case just parsing it) goes slower than the rate at which data comes in. I made the assumption that this data will be buffered by the system. is this a right assumption? And if so:
How big is this buffer?
What happens with incomming data when this buffer gets full?
To make my application protected against spam, how could I best deal with it?
I hope I've explained my issue clear enough.
Thanks in advance.

IRC uses TCP sockets for networking. Linux/Posix TCP sockets have a data buffer for sending and another one for receiving. You can resize the buffers with setsockopt() and SO_SNDBUF/SO_RCVBUF.
TCP has flow control so when a receive buffer is getting full the OS will send a congestion notice. Received packets that didn't fit in the buffer will not be acknowledged by the receiver and will be eventually retransmitted by the sender.
So that's not to worry. What matters is what does the sender program when its socket's send buffer gets full. Some programs will close the socket, others would just discard written data and try again, while others might buffer internally.

Related

Determine if peer has closed reading end of socket

I have a socket programming situation where the client shuts down the writing end of the socket to let the server know input is finished (via receiving EOF), but keeps the reading end open to read back a result (one line of text). It would be useful for the server to know that the client has successfully read the result and closed the socket (or at least shut down the reading end). Is there a good way to check/wait for such status?
No. All you can know is whether your sends succeeded, and some of them will succeed even after the peer read shutdown, because of TCP buffering.
This is poor design. If the server needs to know that the client received the data, the client needs to acknowledge it, which means it can't shutdown its write end. The client should:
send an in-band termination message, as data.
read and acknowledge all further responses until end of stream occurs.
close the socket.
The server should detect the in-band termination message and:
stop reading requests from the socket
send all outstanding responses and read the acknowledgements
close the socket.
OR, if the objective is only to ensure that client and server end at the same time, each end should shutdown its socket for output and then read input until end of stream occurs, then close the socket. That way the final closes will occur more or less simultaneously on both ends.
getsockopt with TCP_INFO seems the most obvious choice, but it's not cross-platform.
Here's an example for Linux:
import socket
import time
import struct
import pprint
def tcp_info(s):
rv = dict(zip("""
state ca_state retransmits probes backoff options snd_rcv_wscale
rto ato snd_mss rcv_mss unacked sacked lost retrans fackets
last_data_sent last_ack_sent last_data_recv last_ack_recv
pmtu rcv_ssthresh rtt rttvar snd_ssthresh snd_cwnd advmss reordering
rcv_rtt rcv_space
total_retrans
pacing_rate max_pacing_rate bytes_acked bytes_received segs_out segs_in
notsent_bytes min_rtt data_segs_in data_segs_out""".split(),
struct.unpack("BBBBBBBIIIIIIIIIIIIIIIIIIIIIIIILLLLIIIIII",
s.getsockopt(socket.IPPROTO_TCP, socket.TCP_INFO, 160))))
wscale = rv.pop("snd_rcv_wscale")
# bit field layout is up to compiler
# FIXME test the order of nibbles
rv["snd_wscale"] = wscale >> 4
rv["rcv_wscale"] = wscale & 0xf
return rv
for i in range(100):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 7878))
s.recv(10)
pprint.pprint(tcp_info(s))
I doubt a true cross-platform alternative exists.
Fundamentally there are quite a few states:
you wrote data to socket, but it was not sent yet
data was sent, but not received
data was sent and losts (relies on timer)
data was received, but not acknowledged yet
acknowledgement not received yet
acknowledgement lost (relies on timer)
data was received by remote host but not read out by application
data was read out by application, but socket still alive
data was read out, and app crashed
data was read out, and app closed the socket
data was read out, and app called shutdown(WR) (almost same as closed)
FIN was not sent by remote yet
FIN was sent by remote but not received yet
FIN was sent and got lost
FIN received by your end
Obviously your OS can distinguish quite a few of these states, but not all of them. I can't think of an API that would be this verbose...
Some systems allow you to query remaining send buffer space. Perhaps if you did, and socket was already shut down, you'd get a neat error?
Good news is just because socket is shut down, doesn't mean you can't interrogate it. I can get all of TCP_INFO after shutdown, with state=7 (closed). In some cases report state=8 (close wait).
http://lxr.free-electrons.com/source/net/ipv4/tcp.c#L1961 has all the gory details of Linux TCP state machine.
TL;DR:
Don't rely on the socket state for this; it can cut you in many error cases. You need to bake the acknowledgement/receipt facility into your communications protocol. First character on each line used for status/ack works really well for text-based protocols.
On many, but not all, Unix-like/POSIXy systems, one can use the TIOCOUTQ (also SIOCOUTQ) ioctl to determine how much data is left in the outgoing buffer.
For TCP sockets, even if the other end has shut down its write side (and therefore will send no more data to this end), all transmissions are acknowledged. The data in the outgoing buffer is only removed when the acknowledgement from the recipient kernel is received. Thus, when there is no more data in the outgoing buffer, we know that the kernel at the other end has received the data.
Unfortunately, this does not mean that the application has received and processed the data. This same limitation applies to all methods that rely on socket state; this is also the reason why fundamentally, the acknowledgement of receipt/acceptance of the final status line must come from the other application, and cannot be automatically detected.
This, in turn, means that neither end can shut down their sending sides before the very final receipt/acknowledge message. You cannot rely on TCP -- or any other protocols' -- automatic socket state management. You must bake in the critical receipts/acknowledgements into the stream protocol itself.
In OP's case, the stream protocol seems to be simple line-based text. This is quite useful and easy to parse. One robust way to "extend" such a protocol is to reserve the first character of each line for the status code (or alternatively, reserve certain one-character lines as acknowledgements).
For large in-flight binary protocols (i.e., protocols where the sender and receiver are not really in sync), it is useful to label each data frame with an increasing (cyclic) integer, and have the other end respond, occasionally, with an update to let the sender know which frames have been completely processed, and which ones received, and whether additional frames should arrive soon/not-very-soon. This is very useful for network-based appliances that consume a lot of data, with the data provider wishing to be kept updated on the progress and desired data rate (think 3D printers, CNC machines, and so on, where the contents of the data changes the maximum acceptable data rate dynamically).
Okay so I recall pulling my hair out trying to solve this very problem back in the late 90's. I finally found an obscure doc that stated that a read call to a disconnected socket will return a 0. I use this fact to this day.
You're probably better off using ZeroMQ. That will send a whole message, or no message at all. If you set it's send buffer length to 1 (the shortest it will go) you can test to see if the send buffer is full. If not, the message was successfully transferred, probably. ZeroMQ is also really nice if you have an unreliable or intermittent network connection as part of your system.
That's still not entirely satisfactory. You're probably even better off implementing your own send acknowledge mechanism on top of ZeroMQ. That way you have absolute proof that a message was received. You don't have proof that a message was not received (something can go wrong between emitting and receiving the ack, and you cannot solve the Two Generals Problem). But that's the best that can be achieved. What you'll have done then is implement a Communicating Sequential Processes architecture on top of ZeroMQ's Actor Model which is itself implemented on top of TCP streams.. Ultimately it's a bit slower, but your application has more certainty of knowing what's gone on.

Reading all available bytes via socket using blocking I/O

When reading from a socket using read(2) and blocking I/O, when do I know that the other side (the client) has no more data to send? (by "no more data to send" I mean that, as an example, the client is waiting for a response). At first, I thought that this point is reached when less than count bytes are returned by read (as in read(fd, *buf, count)).
But what if the client sends the data fragmented? Reading until read returns 0 would be a solution, but as far as I know 0 is only returned when the client closes the connection - otherwise, read would just block until the connection is closed. I thought of using non-blocking I/O and a timeout for select(2), but this does not seem to be a tidy solution to me.
Are there any known best practices?
The concept of "the other side has no more data to send", without either a timeout or some semantics in the transmitted data, is quite pointless. Normally, code on the client/server will be able to process data faster than the network can transmit it. So if there's no data in the receive buffer when you're trying to read() it, this just means the network has not yet transmitted everything, but you have no way to tell if the next packet will arrive within a millisecond, a second, or a day. You'd probably consider the first case as "there is more data to send", the third as "no more data to send", and the second depends on your application.
If the other side doesn't close the connection, you probably don't know when it's ready to send the next data packet either.
So unless you have specific semantics and knowledge about what the client sends, using select() and non-blocking I/O is the best you can do.
In specific cases, there might be other ways - for example, if you know the client will send and XML tag, some data, and a closing tag, every n seconds. In that case you could start reading n seconds after the last packet you received, then just read on until you receive the closing tag. But as i said, this isn't a general approach since it requires semantics on the channel.
TCP is a byte-stream protocol, not a message protocol. If you want messages you really have to implement them yourself, e.g. with a length-word prefix, lines, XML, etc. You can guess with the FIONREAD option of ioctl(), but guessing is all it is, as you can't know whether the client has paused in the middle of transmission of the message, or whether the network has done so for some reason.
The protocol needs to give you a way to know when the client is finishes sending a message.
Common approaches are to send the length of each message before it, or to send a special terminator after each message (similar to the NUL character at the end of strings in C).

Receive message of undefined size in UART in C

I'm writing my own drivers for LPC2148 and a question came to mind.
How do I receive a message of unspecified size in UART?
The only 2 things that come to mind are: 1 - Configure a watchdog and end the receiving when the time runs out. 2- make it so that whenever a meswsage is sent to it there must be an end of message character.
The first choice seems better in my opinion, but I'd like to know if anybody has a better answer, and I know there must be.
Thank you very much
Just give the caller whatever bytes you have received so far. The UART driver shouldn't try to implement the application protocol, the application should do that.
It looks like a wrong use for a watchdog. I ended up with three solutions for this problem:
Use fixed-size packets and DMA; so, you receive one packet per transaction. Apparently, it is not possible in your case.
Receive message char-by-char until the end-of-message character is received. Kind of error-prone, since the EOM char may appear in the data, probably.
Use a fixed-size header before every packet. In the header, store payload size and/or message type ID.
The third approach is probably the best one. You may combine it with the first one, i.e. use DMA to receive header and then data (in the second transaction, after the data size is known from the header). It is also one of the most flexible approaches.
One more thing to worry about is to keep bytestream in sync. There may be rubbish laying in the UART input buffers, which may get read as data, or you can get only a part of a packet after your MCU is powered (i.e. the beginning of the packet had already been sent by that time). To avoid that, you can add magic bytes in your packet header, and probably CRC.
EDIT
OK, one more option :) Just store everything you receive in a growing buffer for later use. That is basically what PC drivers do.
Real embedded uart drivers usually use a ring buffer. Bytes are stored in order and the clients promise to read from the buffer before it's full.
A state machine can then process the message in multiple passes with no need for a watchdog to tell it reception is over
better to go for option 2) append end of transmission character to the transmission string.
but i suggest to add start of transmission also to validate that you are receiving actual transmission.
Watchdog timer is used to reset system when there is a unexpected behavior of device. I think it is better to use a buffer which can store size of data that your application requires.

Reading from the serial port in a multi-threaded program on Linux

I'm writing a program in linux to interface, through serial, with a piece of hardware. The device sends packets of approximately 30-40 bytes at about 10Hz. This software module will interface with others and communicate via IPC so it must perform a specific IPC sleep to allow it to receive messages that it's subscribed to when it isn't doing anything useful.
Currently my code looks something like:
while(1){
IPC_sleep(some_time);
read_serial();
process_serial_data();
}
The problem with this is that sometimes the read will be performed while only a fraction of the next packet is available at the serial port, which means that it isn't all read until next time around the loop. For the specific application it is preferable that the data is read as soon as it's available, and that the program doesn't block while reading.
What's the best solution to this problem?
The best solution is not to sleep ! What I mean is a good solution is probably to mix
the IPC event and the serial event. select is a good tool to do this. Then you have to find and IPC mechanism that is select compatible.
socket based IPC is select() able
pipe based IPC is select() able
posix message queue are also selectable
And then your loop looks like this
while(1) {
select(serial_fd | ipc_fd); //of course this is pseudo code
if(FD_ISSET(fd_set, serial_fd)) {
parse_serial(serial_fd, serial_context);
if(complete_serial_message)
process_serial_data(serial_context)
}
if(FD_ISSET(ipc_fd)) {
do_ipc();
}
}
read_serial is replaced with parse_serial, because if you spend all your time waiting for complete serial packet, then all the benefit of the select is lost. But from your question, it seems you are already doing that, since you mention getting serial data in two different loop.
With the proposed architecture you have good reactivity on both the IPC and the serial side. You read serial data as soon as they are available, but without stopping to process IPC.
Of course it assumes you can change the IPC mechanism. If you can't, perhaps you can make a "bridge process" that interface on one side with whatever IPC you are stuck with, and on the other side uses a select()able IPC to communicate with your serial code.
Store away what you got so far of the message in a buffer of some sort.
If you don't want to block while waiting for new data, use something like select() on the serial port to check that more data is available. If not, you can continue doing some processing or whatever needs to be done instead of blocking until there is data to fetch.
When the rest of the data arrives, add to the buffer and check if there is enough to comprise a complete message. If there is, process it and remove it from the buffer.
You must cache enough of a message to know whether or not it is a complete message or if you will have a complete valid message.
If it is not valid or won't be in an acceptable timeframe, then you toss it. Otherwise, you keep it and process it.
This is typically called implementing a parser for the device's protocol.
This is the algorithm (blocking) that is needed:
while(! complete_packet(p) && time_taken < timeout)
{
p += reading_device.read(); //only blocks for t << 1sec.
time_taken.update();
}
//now you have a complete packet or a timeout.
You can intersperse a callback if you like, or inject relevant portions in your processing loops.

Winsock UDP packets being dropped?

We have a client/server communication system over UDP setup in windows. The problem we are facing is that when the throughput grows, packets are getting dropped. We suspect that this is due to the UDP receive buffer which is continuously being polled causing the buffer to be blocked and dropping any incoming packets. Is it possible that reading this buffer will cause incoming packets to be dropped? If so, what are the options to correct this? The system is written in C. Please let me know if this is too vague and I can try to provide more info. Thanks!
The default socket buffer size in Windows sockets is 8k, or 8192 bytes. Use the setsockopt Windows function to increase the size of the buffer (refer to the SO_RCVBUF option).
But beyond that, increasing the size of your receive buffer will only delay the time until packets get dropped again if you are not reading the packets fast enough.
Typically, you want two threads for this kind of situation.
The first thread exists solely to service the socket. In other words, the thread's sole purpose is to read a packet from the socket, add it to some kind of properly-synchronized shared data structure, signal that a packet has been received, and then read the next packet.
The second thread exists to process the received packets. It sits idle until the first thread signals a packet has been received. It then pulls the packet from the properly-synchronized shared data structure and processes it. It then waits to be signaled again.
As a test, try short-circuiting the full processing of your packets and just write a message to the console (or a file) each time a packet has been received. If you can successfully do this without dropping packets, then breaking your functionality into a "receiving" thread and a "processing" thread will help.
Yes, the stack is allowed to drop packets — silently, even — when its buffers get too full. This is part of the nature of UDP, one of the bits of reliability you give up when you switch from TCP. You can either reinvent TCP — poorly — by adding retry logic, ACK packets, and such, or you can switch to something in-between like SCTP.
There are ways to increase the stack's buffer size, but that's largely missing the point. If you aren't reading fast enough to keep buffer space available already, making the buffers larger is only going to put off the time it takes you to run out of buffer space. The proper solution is to make larger buffers within your own code, and move data from the stack's buffers into your program's buffer ASAP, where it can wait to be processed for arbitrarily long times.
Is it possible that reading this buffer will cause incoming packets to be dropped?
Packets can be dropped if they're arriving faster than you read them.
If so, what are the options to correct this?
One option is to change the network protocol: use TCP, or implement some acknowledgement + 'flow control' using UDP.
Otherwise you need to see why you're not reading fast/often enough.
If the CPU is 100% utilitized then you need to do less work per packet or get a faster CPU (or use multithreading and more CPUs if you aren't already).
If the CPU is not 100%, then perhaps what's happening is:
You read a packet
You do some work, which takes x msec of real-time, some of which is spent blocked on some other I/O (so the CPU isn't busy, but it's not being used to read another packet)
During those x msec, a flood of packets arrive and some are dropped
A cure for this would be to change the threading.
Another possibility is to do several simultaneous reads from the socket (each of your reads provides a buffer into which a UDP packet can be received).
Another possibility is to see whether there's a (O/S-specific) configuration option to increase the number of received UDP packets which the network stack is willing to buffer until you try to read them.
First step, increase the receiver buffer size, Windows pretty much grants all reasonable size requests.
If that doesn't help, your consume code seems to have some fairly slow areas. I would use threading, e.g. with pthreads and utilize a producer consumer pattern to put the incoming datagram in a queue on another thread and then consume from there, so your receive calls don't block and the buffer does not run full
3rd step, modify your application level protocol, allow for batched packets and batch packets at the sender to reduce UDP header overhead from sending a lot of small packets.
4th step check your network gear, switches, etc. can give you detailed output about their traffic statistics, buffer overflows, etc. - if that is in issue get faster switches or possibly switch out a faulty one
... just fyi, I'm running UDP multicast traffic on our backend continuously at avg. ~30Mbit/sec with peaks a 70Mbit/s and my drop rate is bare nil
Not sure about this, but on windows, its not possible to poll the socket and cause a packet to drop. Windows collects the packets separately from your polling and it shouldn't cause any drops.
i am assuming your using select() to poll the socket ? As far as i know , cant cause a drop.
The packets could be lost due to an increase in unrelated network traffic anywhere along the route, or full receive buffers. To mitigate this, you could increase the receive buffer size in Winsock.
Essentially, UDP is an unreliable protocol in the sense that packet delivery is not guaranteed and no error is returned to the sender on delivery failure. If you are worried about packet loss, it would be best to implement acknowledgment packets into your communication protocol, or to port it to a more reliable protocol like TCP. There really aren't any other truly reliable ways to prevent UDP packet loss.

Resources