I need to "wake up" a process that is waiting on epoll() from another process.
I've created a UDS (AF_UNIX) type SOCK_DGRAM where:
The client every few ms might send one char to the server
The server is waiting with epoll() on the socket for read
I don't need the data from the client, only to "wake up" from it
How can i do this the most efficiently?
Do i have to read() the data?
Can the server somehow ignore the data without overloading the socket's memory?
Do i have to read() the data? Can the server somehow ignore the data without overloading the socket's memory?
If you're receiving data on a socket on an ongoing basis then yes, you need to read that data, else the socket buffer will eventually fill. After it does, you will not receive any more data. You don't need to do anything with the data you read, and you can consume many bytes at a time if you wish, but reading the data is how you remove them from the socket buffer.
You will also find that epoll_wait() does not behave as you want if you do not read the data. If you are watching the socket fd in level-triggered mode, and there are already data available to read, then epoll_wait() will not block. If you are watching the socket fd in edge-triggered mode, and there are already data ready to read, then receiving more data will not cause epoll_wait() to unblock.
How can i do this the most efficiently?
Are you really worried about single-byte read() calls at rate not exceeding one every few milliseconds? Is this for some low-power embedded system?
I don't really see a lot of room for improvement if you've settled on using epoll for this. If it turns out not to perform well enough for you, then you could consider alternatives such as process-shared semaphores or signals, though it is by no means clear that either of these would be superior. This is what performance testing is for.
Related
From the manpage for select():
those in writefds will be watched to see if a write will not block
For a file descriptor that is associated with a TCP/IP connection, how does the select() function determine when the connection can be written to without blocking? An alternative way of phrasing my question would also be, what are the conditions when select() will return indicating the file descriptor can be written to without blocking?
I'd assume that select() will not return the fd in the fd_set if the send buffer is full. If true, is this the only consideration? I can imagine many possible criteria for determining if a write should block or not, so I'm interesting in knowing specifically how this works on Linux.
It will indicate the FD is writable when there is space in the send buffer. There are no other considerations.
When connecting, this includes the case when the conenction is complete, as the send buffer is effectively allocated at that time.
If you write data they are not transmitted immediately to the peer, but they get first stored in the socket buffer. The kernel then takes the data out of the socket buffer and transfers them. Depending on the transfer protocol the data might be transmitted as fast as possible (UDP) or there is some flow control which causes the data to be kept on the senders side until the sender has acknowledged them (TCP). If there are too much unacknowledged data it will stop processing data inside the socket buffer, which will then start to fill up. Once there is no more space in the buffer writing will block. And once there is again enough space in the socket buffer writing will be possible again this will be signaled within select.
I have to make a simple IRC client/server programs for my IT school. The subject asks us to use select(2) for socket polling but forbids us to use O_NONBLOCK sockets.
Your server will accept multiple simultaneous connections.
Attention, the use of fork is prohibited. So you should imperatively use select
Your server must not be blocking.
This has nothing to do with non-blocking sockets, which are prohibited (so do not use fcntl(s, O_NONBLOCK))
I’m wondering if it is even possible to design a non-blocking server (which does not fork) with blocking sockets even using select(2).
Here is a simple example: let say we have a simple text protocol with one command per line. Each client has a buffer. When select(2) tells us a client is ready for read(2), we read until we found a \n in the client buffer, therefor we process the command. With non-blocking sockets, we would read until EAGAIN.
Let imagine now that we are using blocking sockets and a malicious client sends text with no line-break. select(2) tells us data is available, we then read(2) on the client. But we will never read the expected \n. Instead of returning EAGAIN, the syscall will block indefinitely. This is a denial of service attack.
Is it really possible to design a non-blocking server with blocking-sockets and select(2) (no fork(2))?
Yes, you read once from the socket that select tells you is ready. If the read contains the \n, then process that line. Otherwise, store any data that was received, and immediately go back to the select.
This means of course, that for every open socket, you must maintain state information, and a buffer of data read so far. This allows the code to process each read independently, without the need to finish a full line before going back to the select.
It's impossible.
select() blocks, and therefore so does any program that calls it.
The behaviour defined by Posix for send() in blocking mode is that it blocks until all the data supplied has been transferred to the socket send buffer. Unless you're going to delve into low-water marks and so on, it is impossible to know in advance whether there is enough room in he socket send buffer for any given send() to complete without blocking, and therefore impossible for any program that calls send() not to block.
Note that select() doesn't help you with this. It can tell when you when there is some room, but not when there is enough.
I'm writing a network application based on an epoll loop and thread pool for handling requests. In each thread I take special care not to block on client reads, by using non-blocking sockets and returning as soon as read returns EAGAIN (or EWOULDBLOCK to be POSIX compliant...).
Should I take special care in socket writes too? I don't see myself sending enough data to fill the system TCP buffers, and blocking for a while shouldn't be too harmful. Is this the only case where a write on a socket blocks? Not enough buffer size?
Also, can a socket be declared non-blocking for reads and blocking for writes? Or should I use fcntl all the time to switch between these behaviours?
Thanks!
The only case where a write on a socket blocks is when the data written won't fit in the buffer. Ideally, you would handle writes much the way you handle reads. You don't need to switch before each write. If you want to block on write, use this logic:
Perform the write.
If it completes, you're done.
Switch the socket to blocking.
Perform the (rest of) the write.
Switch the socket back to non-blocking.
The correct way to handle this is to use non-blocking I/O throughout given that you're using non-blocking reads. When you do a write, if you get -1/EWOULDBLOCK, start selecting on that socket for writeability, and when you get it, retry the write. If that succeeds, stop selecting on writeability for that socket. Don't switch to blocking-mode for that socket, it defeats the whole purpose.
I've seen a few write-ups comparing select() with poll() or epoll(), and I've seen many guides discussing the actual usage of select() with multiple sockets.
However, what I can't seem to find is a comparison to a non-blocking recv() call without select(). In the event of only having 1 socket to read from and 1 socket to write to, is there any justification for using the select() call? The recv() method can be setup to not block and return an error (WSAEWOULDBLOCK) when there is no data available, so why bother to call select() when you have no other sockets to examine? Is the non-blocking recv() call much slower?
You wouldn't want a non-blocking call to recv without some other means for waiting for data on the socket as you poll infinitely eating up cpu time.
If you have no other sockets to examine and nothing else to do in the same thread, a blocking call to read is likely to be the most efficient solution. Although in such a situation, considering the efficiency of this is like to be premature optimisation.
These kinds of considerations only tend to come into play as the socket count increases.
Nonblocking calls are only faster in the context of handling multiple sockets on a single thread.
If there is no data available, and you use non-blocking IO, recv() will return immediately.
Then what should the program do ? You would need to call recv() in a loop until data becomes available - this just uses CPU for pretty much no reason.
Spinning on recv() and burning CPU in that manner is very undesirable; you'd rather want the process to wait until data becomes available and get woken up; that's what select()/poll() and similar does.
And, sleep() in the loop in order to not burn CPU is not a good solution either. You'd introduce high latency in the processing as the program will not be able to process data as soon as the data is available.
select() and friends let you design the workflow in such a way that slowness of one socket does not impede the speed at which you can serve another. Imagine that data arrives fast from the receiving socket and you want to accept it as fast as possible and store in memory buffers. But the sending socket is slow. When you've filled up the sending buffers of the OS and send() gave you EWOULDBLOCK, you can issue select() to wait on both receiving and sending sockets. select() will fall through if either new data on the receiving socket arrived, or some buffers are freed and you can write more data to the sending socket, whichever happens first.
Of course a more realistic use case for select() is when you have multiple sockets to read from and/or to write to, or when you must pass the data between your two sockets in both directions.
In fact, select() tells you when the next read or write operation on a socket is known to succeed, so if you only try to read and write when select allows you, your program will almost work even if you didn't make the sockets non-blocking! It is still unwise to do, because there exist edge cases when the next operation still may block despite select() reported that the socket as "ready".
On the other hand, making the sockets non-blocking and not using select() is almost never advisable because of the reason explained by #Troy.
My application has ONLY 1 Unix TCP socket that it uses to recv() and send(). The socket is non-blocking. Given this, is there an advantage in doing a select() before a send()/recv()?
If the underlying TCP pipe is not ready for an I/O, the send()/recv() should immediately return with an EWOULDBLOCK or EAGAIN. So, what's the point of doing a select()? Seems like, it might only cause an additional system call overhead in this case. Am I missing anything?
EDIT: Forgot to mention: The application is single-threaded.
If your socket is non-blocking, then you need select (or preferably poll, which does not have the broken FD_SETSIZE limit and associated dangers) to block for you in place of the blocking that would be taking place (if the socket were not non-blocking) in send and recv. Otherwise you will spin, using 100% cpu time to do-nothing. In most cases, you could just as easily make the socket blocking and do away with select/poll. However, there is one interesting case to consider: blocking IO could deadlock if your program is blocked in send and the program at the other end of the socket is also blocked in send (or the opposite). With non-blocking IO and select/poll, you naturally detect this situation and process the pending input when writing your output is not possible.
You could just do recv() in a loop, but then you'll be consuming a lot of CPU time.
Better to wait on select() to avoid the extra CPU overhead. If you need to be doing background tasks, add a timeout to select() so you can wake periodically, even with no network traffic.
If your application is letency sensitive then in may be justified to spin in a tight recv() loop without select() and give it a dedicated CPU (otherwise scheduler will punish it and you end up having massive latency). If your app cannot afford it but still does gie a thread to serve this socket then just make the socket blocking on read side and let scheduler wake your thread up when data is available. On the sending side again depends on what you need, either make the socket blocking or spin.
Only if your application is single threaded and the logic is "receive-process-reply" you absolutely need a non blocking read/write socket, selector, and a write queue, so that you receive when data is there, process, pit response to the queue, register for writability, flush the queue to the socket when writable, unregister from writability. readability is to be registered for all the time.