Can I set SO_RCVBUF to 1 on a UDP socket - c

I have a system where a single-byte message is sent via a UDP socket from one process to another when something happens. On the receiving end, it matters not if this event has happened once or a dozen or even a million times.
Rather than making many useless recvfrom() calls, I thought maybe I could just use setsockopt to set SO_RCVBUF to 1 which I am hoping would cause the system to simply ignore all but the first message (until it is read thus allowing another). However, googling turns up some evidence that this might not be 100% portable (it looks like some systems might quietly enforce a minimum size?).
So before I go to the work of trying this I was wondering if anyone knew how likely this would be to work? This is on Red Hat Linux if that matters in any way.

On linux, the actual size is bound between SOCK_MIN_RCVBUF and sysctl_rmem_max, but the call to setsockopt will never fail. See here. This includes overhead, among other things, for a struct sk_buff.
The value of SOCK_MIN_RCVBUF is:
#define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))
As to what you want to do: AFAIK, there is no way to clear the receive buffer without reading (or closing the socket).

You are assuming SO_RCVBUF is measured in bytes of application data. For UDP, it isn't on any platform that I know of.

Related

Client side program won't move ahead

I'm writing a program where a client tries to download a file from the server.
The following is the code for the client side.
FILE * file_to_download;
file_to_download = fopen("CLIENT_file_downloaded", "ab");
long int total_bytes = 0;
int bytes_received = 0;
char received_buffer[2048];
printf(">>> Downloading file...\n");
while ((bytes_received = read(connector, received_buffer, sizeof received_buffer)) > 0) { // keep receiving until data stops sending
printf("=");
total_bytes += bytes_received;
fwrite(received_buffer, 1, bytes_received, file_to_download);
}
if (bytes_received < 0) {
printf("\n>>> Read error\n");
exit(1);
}
printf("\n>>> File downloaded (Bytes received: %ld)\n\n", total_bytes);
This works perfectly when I close the socket connection immediately after, however, if I leave it open for some other functionality (say sending a message to the server), it halts at ">>> Downloading file...", however, I can see the downloaded file in the folder. Also, once I terminate the server side program, it prints out everything else.
From other SO threads, I think this has something to do with the socket getting "blocked".
But then why is my file downloaded (I can see it in the folder)? The fwrite function is responsible for this which is inside the while loop, but if it is going inside the while loop, why doesn't it print anything?
How can I have the server tell the client that it's done sending it data and how can I have the client side program move forward?
You've already gotten a lot of comments, but I'll try to summarize at least a few of the bits and pieces into one place.
First of all, the problem: as you're doing things right now, there are basically three ways you call to read can return:
It returns a strictly positive value (i.e., at least 1) that tells you how may bytes you read.
It returns 0 to indicate that the socket was closed.
It return a negative value to indicate an error.
But there's not a defined way for it to return and tell you: "the socket's still open, there was no error, but there's no data waiting to be read."
So, as others have said, if you're going to transfer a file (or some other defined chunk of data) you generally need to define some application-level protocol on top of TCP to support that. The most obvious starting point is that you send the size of the file first (typically as a single fixed-size chunk, such as 4 or 8 bytes), followed by that many bytes of data.
If you do just that, you can define something that at least can work. There are all sorts of possible errors it can miss, but at least if things all work well, it can be fine.
The next step beyond that is typically to add something like some sort of checksum/CRC so when you think a transfer is complete, you can verify the data to get at least a reasonable assurance that it worked (i.e., that the data you received matches what was sent).
Another generally direction to consider is how you're doing your reading. There are a couple of choices here. One is to avoid calling read until you're sure it can/will succeed. If you're dealing only with one (or a few) sockets at a time, you can call select, which will tell you when your socket is ready to read, so issuing a read is guaranteed to succeed quickly. It might read less than you've asked, but it will return rather than waiting indefinitely for data. If you have to deal with a lot of sockets, you might prefer to look up epoll, which does roughly the same thing, but reduces overhead when you have to deal with many handles.
Another possible way to deal with this problem is to set the O_NONBLOCK option for your socket. In this case, attempting to read when no data is available will be treated as an error, so it'll return immediately with an error of EAGAIN or EWOUDLBLOCK (you have to be prepared for either). This gives you a fairly easy way to at least proceed when you have no more data available, but does nothing about any of the other difficulties in transferring data effectively.
As others have noted, there are quite a few existing protocols for doing things like this and reinventing it may not be the best use of your time. On the other hand, some protocols can be somewhat painful (e.g., ftp's normal mode requires that you open/use two separate sockets). Others are complex enough that you probably don't want to try to implement them on your own, but libraries to support them well can be difficult to find.
Personally, I've found that websockets work pretty reasonably for quite a few tasks like this. They include framing (so what was sent as a single websocket write will be received with a single websocket read). They also use CRC to do error checking. So, for quite a few cases like this, it'll take care most of the details more or less automatically. It also includes (and in most cases uses) what they call a ping/pong protocol to detect loss of connection much faster than TCP normally does on its own.
But as noted above, there are lots of alternatives, some of them designed much more specifically for transferring files (so what you receive isn't just the content of the file, but things like the name and other metadata attached to that content).

Can I get socket buffer remainder size?

I have a real-time system, so I using the non-blocking socket to send my data
But there is happened that the socket buffer is full,
so the send function's return value less than my data length.
If I save the return length and re-send, there is not different with blocking socket?
So can I get the socket buffer's remainder size? I can check it first,
if it is enough then I call send, skip send else.
Thank you, all.
Well there is a difference between blocking and non-blocking - if you experience a short write you don't block. That's the whole point of non-blocking. It give you an opportunity to do something more pressing while waiting for some buffer space to free up.
Your concern seems to be the repeated attempts to write a full message, that is, a form of polling. But a check of the bytes free in the buffer is the same thing, you are just substituting the call to the availability with the call to write. You really don't gain anything efficiency wise.
The commonplace solution to this is to use something like select or poll that monitors the socket descriptor for the ability to write (and least some) bytes. This allows you stop polling and foist some of the work off on the kernel to monitor the space availability for you.
That said, if you really want to check to see how much space is available there are usually work arounds that tend to somewhat platform specific, mostly ioctl calls with various platform specific parameters like FIONWRITE, SIOCOUTQ, etc. You would need to investigate exactly what your platform provides. But, again, it is better to consider if this is really something you need in the first place.
If the asynchronous send fails with EWOULDBLOCK/EAGAIN, no data is sent. You could then try to send something else, or wait until the buffer is free again.
Also see https://stackoverflow.com/questions/19391208/when-a-non-blocking-send-only-transfers-partial-data-can-we-assume-it-would-r - a related issue is discussed there.

Strategy for estimating / calculating buffer space needed by writer function on embedded system

This isn't a show-stopping programming problem as such, but perhaps more of a design pattern issue. I'd have thought it'd be a common design issue on embedded resource-limited systems, but none of the questions I found so far on SO seem relevant (but please point out anything relevant that I could have missed).
Essentially, I'm trying to work out the best strategy of estimating the largest buffer size required by some writer function, when that writer function's output isn't fixed, particularly because some of the data are text strings of variable length.
This is a C application that runs on a small ARM micro. The application needs to send various message types via TCP socket. When I want to send a TCP packet, the TCP stack (Keil RL) provides me with a buffer (which the library allocates from its own pool) into which I may write the packet data payload. That buffer size depends of course on the MSS; so let's assume it's 1460 at most, but it could be smaller.
Once I have this buffer, I pass this buffer and its length to a writer function, which in turn may call various nested writer functions in order to build the complete message. The reason for this structure is because I'm actually generating a small XML document, where each writer function typically generates a specific XML element. Each writer function wants to write a number of bytes to my allocated TCP packet buffer. I only know exactly how many bytes a given writer function writes at run-time, because some of the encapsulated content depends on user-defined text strings of variable length.
Some messages need to be around (say) 2K in size, meaning they're likely to be split across at least two TCP packet send operations. Those messages will be constructed by calling a series of writer functions that produce, say, a hundred bytes at a time.
Prior to making a call to each writer function, or perhaps within the writer function itself, I initially need to compare the buffer space available with how much that writer function requires; and if there isn't enough space available, then transmit that packet and continue writing into a fresh packet later.
Possible solutions I am considering are:
Use another much larger buffer to write everything into initially. This isn't preferred because of resource constraints. Furthermore, I would still wish for a means to algorithmically work out how much space I need by my message writer functions.
At compile time, produce a 'worst case size' constant for each writer function. Each writer function typically generates an XML element such as <START_TAG>[string]</START_TAG>, so I could have something like: #define SPACE_NEEDED ( START_TAG_LENGTH + START_TAG_LENGTH + MAX_STRING_LENGTH + SOME_MARGIN ). All of my content writer functions are picked out of a table of function pointers anyway, so I could have the worst-case size estimate constants for each writer function exist as a new column in that table. At run-time, I check the buffer room against that estimate constant. This probably my favourite solution at the moment. The only downside is that it does rely on correct maintenance to make it work.
My writer functions provide a special 'dummy run' mode where they run though and calculate how many bytes they want to write but don't write anything. This could be achieved by perhaps simply sending NULL in place of the buffer pointer to the function, in which case the functions's return value (which usually states amount written to buffer) just states how much it wants to write. The only thing I don't like about this is that, between the 'dummy' and 'real' call, the underlying data could - at least in theory - change. A possible solution for that could be to statically capture the underlying data.
Thanks in advance for any thoughts and comments.
Solution
Something I had actually already started doing since posting the question was to make each content writer function accept a state, or 'iteration' parameter, which allows the writer to be called many times over by the TCP send function. The writer is called until it flags that it has no more to write. If the TCP send function decides after a certain iteration that the buffer is now nearing full, it sends the packet and then the process continues later with a new packet buffer. This technique is very similar I think to Max's answer, which I've therefore accepted.
A key thing is that on each iteration, a content writer must be designed so that it won't write more than LENGTH bytes to the buffer; and after each call to the writer, the TCP send function will check that it has LENGTH room left in the packet buffer before calling the writer again. If not, it continues in a new packet.
Another step I did was to have a serious think about how I structure my message headers. It became apparent that, like I suppose with almost all protocols that use TCP, it is essential to implement into the application protocol some means of indicating the total message length. The reason for this is because TCP is a stream-based protocol, not a packet-based protocol. This is again where it got a bit of a headache because I needed some upfront means of knowing the total message length for insertion into the start header. The simple solution to this was to insert a message header into the start of every sent TCP packet, rather than only at the start of the application protocol message (which may of course span several TCP sockets), and basically implement fragmentation. So, in the header, I implemented two flags: a fragment flag, and a last-fragment flag. Therefore the length field in each header only needs to state the size of the payload in the particular packet. At the receiving end, individual header+payload chunks are read out of the stream and then reassembled into a complete protocol message.
This of course is no doubt very simplistically how HTTP and so many other protocols work over TCP. It's just quite interesting that, only once I've attempted to write a robust protocol that works over TCP, have I started to realise the importance of really thinking the your message structure in terms of headers, framing, and so forth so that it works over a stream protocol.
I had a related problem in a much smaller embedded system, running on a PIC 16 micro-controller (and written in assembly language, rather than C). My 'buffer size' was always going to be the two byte UART transmit queue, and I had only one 'writer' function, which was walking a DOM and emitting its XML serialisation.
The solution I came up with was to turn the problem 'inside out'. The writer function becomes a task: each time it is called it writes as many bytes as it can (which may be >2 depending on the serial data transmission rate) until the transmit buffer is full, then it returns. However, it remembers, in a state variable, how far it had got through the DOM. The next time it is called, it caries on from the point previously reached. The writer task is called repeatedly from a loop. If there is no free buffer space, it returns immediately without changing its state. It is called repeatedly from an infinite loop, which acts as a round-robin scheduler for this task and the others in the system. Each time round the loop, there is a delay which waits for the TMR0 timer to overflow. So each task gets called exactly once in a fixed time slice.
In my implementation, the data is transmitted by a TxEmpty interrupt routine, but it could also be sent by another task.
I guess the 'pattern' here is that one role of the program counter is to hold the current state of the flow of control, and that this role can be abstracted away from the PC to another data structure.
Obviously, this isn't immediately applicable to your larger, higher level system. But it is a different way of looking at the problem, which may spark your own particulr insight.
Good luck!

Are repeated recv() calls expensive?

I have a question about a situation that I face quite often. From time to time I have to implement various TCP-based protocols. Most of them define variable-length data packets that begin with a common header ([packet ID, length, payload] or something really similar). Obviously, there can be two approaches to reading these packets:
Read header (since header length is usually fixed), extract the payload length, read the payload
Read all available data and store it in a buffer; parse the buffer afterwards
Obviously, the first approach is simple, but requires two calls to read() (or probably more). The second one is slightly more complicated, but requires less calls.
The question is: does the first approach affect the performance badly enough to worry about it?
yes, system calls are generally expensive, compared to memory copies. IMHO it is particularly true on x86 architecture, and arguable on RISC machine (arm, mips, ...).
To be honest, unless you must handle hundreds or thousands of request per second, you will hardly notice the difference.
Depending on what is exactly the protocol, an hybrid approach could be the best. When the protocol uses a lot of small packets and less big ones, you can read the header and a partial amount of data. When it is a small packet, you win by avoiding a large memcpy, when the packet is big, you win by issuing a second syscall only for that case.
If your application is a server capable of handling multiple clients simultaneously and non-blocking sockets are used to handle multiple clients in one thread, you have little choice but to only ever issue one recv() syscall when a socket becomes ready for read.
The reason for that is if you keep calling recv() in a loop and the client sends a large volume of data, what can happen is that your recv() loop may block the thread for long time from doing anything else. E.g., recv() reads some amount of data from the socket, determines that there is now a complete message in the buffer and forwards that message to the callback. The callback processes the message somehow and returns. If you call recv() once more there can be more messages that have arrived while the callback was processing the previous message. This leads to a busy recv() loop on one socket preventing the thread from processing any other pending events.
This issue is exacerbated if the socket read buffer in your application is smaller than the kernel socket receive buffer. In other words, the whole contents of the kernel receive buffer can not be read in one recv() call. Anecdotal evidence is that I hit this issue on a busy production system when there was a 16Kb user-space buffer for a 2Mb kernel socket receive buffer. A client sending many messages in succession would block the thread in that recv() loop for minutes because more messages would arrive when the just read messages were being processed, leading to disruption of the service.
In such event-driven architectures it is best to have the user-space read buffer equal to the size of the kernel socket receive buffer (or the maximum message size, whichever is bigger), so that all the data available in the kernel buffer can be read in one recv() call. This works by doing one recv() call, processing all complete messages in the user-space read buffer and then returning control to the event loop. This way a connections with a lot of incoming data is not blocking the thread from processing other events and connections, rather it round-robin's processing of all connections with incoming data available.
The best way to get your answer is to measure. The strace program is decent for the purpose of measuring system call times. Using it adds a lot of overhead in itself, but if you merely compare the cost of one recv for this purpose versus the cost of two, it should be reasonably meaningful. Use the -tt option to get times. Or you can use the -c option to get an overview of time spent separated by which syscall it was spent on.
A better way to measure, albeit with more of a learning curve, is oprofile.
Also note that if you do decide buffering is worthwhile, you may be able to use fdopen and the stdio functions to take care of it for you. This is extremely easy and will work well if you're only dealing with a single connection or if you have a thread/process per connection, but won't work at all if you want to use a select/poll-based model.
Note that you generally have to "read all the available data into a buffer and process it afterwards" anyway, to account for the (unlikely, but possible) scenario where a recv() call returns only part of your header - so you might as well go the whole hog and use option 2.
Yes, depending upon the scenario the read/recv calls may be expensive. For example, if you are issuing huge number of recv() calls to read very small amount of data every small interval, it would be a performance hit. In such scenario you could issue a recv() with reasonably large buffer, let say 4k, and then parse that 4k buffer. It may contain multiple header+data combo. By reading header first you can find the data and its length. And to avoid the mem copy of data into a new buffer, you can just use the offset from where the actual data start, and store that pointer.

How can I buffer non-blocking IO?

When I need buffered IO on blocking file descriptor I use stdio. But if I turn file descriptor into non-blocking mode according to manual stdio buffering is unusable. After some research I see that BIO can be usable for buffering non-blocking IO.
But may be there are other alternatives?
I need this to avoid using threads in a multi-connection environment.
I think what you are talking about is the Reactor Pattern. This is a pretty standard way of processing lots of network connections without threads, and is very common in multiplayer game server engines. Another implementation (in python) is twisted matrix.
The basic algorith is:
have a buffer for each socket
check which sockets are ready to read (select(), poll(), or just iterate)
for each socket:
call recv() and accumulate the contents into the socket's buffer until recv returns 0 or an error with EWOULDBLOCK
call application level data handler for the socket with the contents of the buffer
clear the socket's buffer
I see the question has been edited now, and is at least more understandable than before.
Anyway, isn't this a contradiction?
You make I/O non-blocking because you want to be able to read small amounts quickly, typically sacrificing throughput for latency.
You make it buffered because you don't care that much about latency, but want to make efficient use of the I/O subsystem by trading latency for throughput.
Doing them both at the same time seems like a contradiction, and is hard to imagine.
What are the semantics you're after? If you do this:
int fd;
char buf[1024];
ssize_t got;
fd = setup_non_blocking_io(...);
got = read(fd, buf, sizeof buf);
What behavior do you expect if there is 3 bytes available? Blocking/buffered I/O might block until able to read more satisfy your request, non-blocking I/O would return the 3 available bytes immediately.
Of course, if you have some protocol on top, that defines some kind of message structure so that you can know that "this I/O is incomplete, I can't parse it until I have more data", you can buffer it yourself at that level, and not pass data on upwards until a full message has been received.
Depending on the protocol, it is certainly possible that you will need to buffer your reads for a non-blocking network node (client or server).
Typically, these buffers provide multiple indexes (offsets) that both record the position of the last byte processed and last byte read (which is either the same or greater than the processed offset). And they also (should) provide richer semantics of compacting the buffer, transparent buffer size management, etc.
In Java (at least) the non-blocking network io (NIO) packages also provide a set of data structures (ByteBuffer, etc.) that are geared towards providing a general data structure.
There either exists such data structures for C, or you must roll your own. Once you have it, then simply read as much data as available and let the buffer manage issues such as overflow (e.g. reading bytes across message frame boundaries) and use the marker offset to mark off the bytes that you have processed.
As Android pointed out, you will (very likely) need to create matched buffers for each open connection.
You could create a struct with buffers for each open file descriptor, then accumulate these buffers until recv() returns 0 or you have data enough to process in your buffer.
If I understand your question correctly, you can't buffer because with non-blocking you're writing to the same buffer with multiple connections (if global) or just writing small pieces of data (if local).
In any case, your program has to be able to identify where the data is coming (possibly by file descriptor) from and buffer it accordingly.
Threading is also an option, it's not as scary as many make it sound out to be.
Ryan Dahl's evcom library which does exactly what you wanted.
I use it in my job and it works great. Be aware, though, that it doesn't (yet, but coming soon) have async DNS resolving. Ryan suggests udns by Michael Tokarev for that. I'm trying to adopt udns instead of blocking getaddrinfo() now.

Resources