According to tutorial for libuv, making subsequent calls to uv_write should not cause one write to block another write (my understanding was that they were supposed to occur on separate threads).
However I've run the example code under strace and it seems that this isn't the case. Having run similar examples using uv_fs_write, I can see that each call to write occurs on separate threads and don't block.
Can someone explain what the expected behaviour is for uv_write and if it is supposed to be different from uv_fs_write when the underlying stream is a file handle?
cat Makefile | strace ./uvtee/uvtee ~/out.txt
open("/home/james/out.txt", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 11
ioctl(11, FIONBIO, [1]) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN, {u32=9, u64=9}}) = 0
epoll_ctl(6, EPOLL_CTL_ADD, 0, {EPOLLIN, {u32=0, u64=0}}) = 0
epoll_wait(6, [{EPOLLIN|EPOLLHUP, {u32=0, u64=0}}], 1024, -1) = 1
brk(0xb3e000) = 0xb3e000
read(0, "examples=\\\n\thelloworld\\\n\tidle-ba"..., 65536) = 1965
write(1, "examples=\\\n\thelloworld\\\n\tidle-ba"..., 1965) = 1965
write(11, "examples=\\\n\thelloworld\\\n\tidle-ba"..., 1965) = 1965
Full code can be found here.
The fs operations run on thread pool. This is because there is no good and portable way to do non-blocking IO for files. Because we use a thread pool, write operations can indeed run in parallel. That's why uv_fs_write takes an offset parameter, so multiple threas can write without stepping on top of each other.
A notable exception to this is macOS, where a global lock is used to serialize uv_fs_write operations.
Now, network IO is totally different. We use an event loop (as you know) and write operations are queued, so they will be written in the order they were sent and when the underlying socket is writable.
Related
The following little C program (let's call it pointless):
/* pointless.c */
#include <stdio.h>
#include <unistd.h>
void main(){
write(STDOUT_FILENO, "", 0); /* pointless write() of 0 bytes */
sleep(1);
write(STDOUT_FILENO, "still there!\n", 13);
}
will print "still there!" after a small delay, as expected. However,
rlwrap ./pointless prints nothing under AIX and exits immediatly.
Apparently, rlwrap reads 0 bytes after the first write() and
(incorrectly) decides that pointless has called it quits.
When running pointless without rlwrap, and with rlwrap on all
other systems I could lay my hand on (Linux, OSX, FreeBSD), the "still
there!" gets printed, as expected.
The relevant rlwrap (pseudo-)code is this:
/* master is the file descriptor of the master end of a pty, while the slave is 'pointless's stdout */
/* master was opened with O_NDELAY */
while(pselect(nfds, &readfds, .....)) {
if (FD_ISSET(master, &readfds)) { /* master is "ready" for reading */
nread = read(master, buf, BUFFSIZE - 1); /* so try to read a buffer's worth */
if (nread == 0) /* 0 bytes read... */
cleanup_and_exit(); /* ... usually means EOF, doens't it? */
Apparently, on all systems, except AIX, writeing 0 bytes on the
slave end of a pty is a no-op, while on AIX it wakes up the
select() on the master end. Writing 0 bytes seems pointless, but one
of my test programs writes random-length chunks of text, which may
actually happen to have length 0.
On linux, man 2 read states "on success, the number of bytes read is
returned (zero indicates end of file)" (italics are mine) This
question has come up
before
without mention of this scenario.
This begs the question: how can I portably determine whether the
slave end has been closed? (In this case I can probably just wait for
a SIGCHLD and then close shop, but that might open another can of
worms I'd rather avoid)
Edit: POSIX states:
Writing a zero-length buffer (nbyte is 0) to a STREAMS device sends 0 bytes with 0 returned. However, writing a zero-length buffer to a STREAMS-based pipe or FIFO sends no message and 0 is returned. The process may issue I_SWROPT ioctl() to enable zero-length messages to be sent across the pipe or FIFO.
On AIX, pty is indeed a STREAMS device, moreover, not a pipe or FIFO. ioctl(STDOUT_FILENO, I_SWROPT, 0) seems to make it possible to make the pty conform to the rest of the Unix world. The sad thing is that this has to be called from the slave side, and so is outside rlwraps sphere of infuence (even though we could call the ioctl() between fork() and exec() - that would not guarantee that the executed command won't change it back)
Per POSIX:
When attempting to read from an empty pipe or FIFO:
If no process has the pipe open for writing, read() shall return 0 to indicate end-of-file."
So the "read of zero bytes means EOF" is POSIX-compliant.
On the write() side (bolding mine):
Before any action described below is taken, and if nbyte is zero and the file is a regular file, the write() function may detect and return errors as described below. In the absence of errors, or if error detection is not performed, the write() function shall return zero and have no other results. If nbyte is zero and the file is not a regular file, the results are unspecified.
Unfortunately, that means you can't portably depend on a write() of zero bytes to have no effect because AIX is compliant with the POSIX standard for write() here.
You probably have to rely on SIGCHLD.
From the linux man page
If count is zero and fd refers to a regular file, then write()
may return a failure status if one of the errors below is
detected. If no errors are detected, or error detection is not
performed, 0 will be returned without causing any other effect.
If count is zero and fd refers to a file other than a regular
file, the results are not specified.
So, since it is unspecified, it can do whatever it likes in your case.
So my end goal is to allow multiple threads to read the same file from start to finish. For example, if the file was 200 bytes:
Thread A 0-> 200 bytes
Thread B 0-> 200 bytes
Thread C 0-> 200 bytes
etc.
Basically have each thread read the entire file. The software is only reading that file, no writing.
so I open the file:
fd = open(filename, O_RDWR|O_SYNC, 0);
and then in each thread simply loop the file. Because I only create one File Descriptor, are also create a create a clone of the file descriptor in each thread using dup
Here is a minimual example of a thread function:
void ThreadFunction(){
int file_desc= dup(fd);
uint32_t nReadBuffer[1000];
int numBytes = -1;
while (numBytes != 0) {
numBytes = read(file_desc, &nReadBuffer, sizeof(nReadBuffer));
//processing on the bytes goes here
}
}
However, I'm not sure this is correctly looping through the entire file and each thread is instead somehow daisy chaining through the file.
Is this approach correct? I inherited this software for a project I am working on, the file descriptor gets used in an mmap call, so I am not entirely sure of O_RDWR or O_SYNC matter
As other folks have mentioned, it isn't possible to use a duplicated file descriptor here. However, there is a thread-safe alternative, which is to use pread. pread reads a file at an offset and doesn't change the implicit offset in the file description.
This does mean that you have to manually manage the offset in each thread, but that shouldn't be too much of a problem with your proposed function.
This must be a stupid question because this should be a very common and simple problem, but I haven't been able to find an answer anywhere, so I'll bite the bullet and ask.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data? Obviously if the data ends in some kind of terminator like a NUL or EOF then this is quite trivial, but my data does not. This is simple IPC: the two programs need to talk back and forth and ending the file streams with EOF would break everything.
I thought this should be fairly simple. Clearly programs talk to each other over pipes all the time without needing any arcane tricks, so I hope there is a simple answer that I'm too stupid to have thought of. Nothing I've tried has worked.
Something obvious like (ignoring necessary realloc's for brevity):
int size = 0, max = 8192;
unsigned char *buf = malloc(max);
while (fread((buf + size), 1, 1, stdin) == 1)
++size;
won't work since fread() blocks and waits for data, so this loop won't terminate. As far as I know nothing in stdio allows nonblocking input, so I didn't even try any such function. Something like this is the best I could come up with:
struct mydata {
unsigned char *data;
int slen; /* size of data */
int mlen; /* maximum allocated size */
};
...
struct mydata *buf = xmalloc(sizeof *buf);
buf->data = xmalloc((buf->mlen = 8192));
buf->slen = 0;
int nread = read(0, buf->data, 1);
if (nread == (-1))
err(1, "read error");
buf->slen += nread;
fcntl(0, F_SETFL, oflags | O_NONBLOCK);
do {
if (buf->slen >= (buf->mlen - 32))
buf->data = xrealloc(buf->data, (buf->mlen *= 2));
nread = read(0, (buf->data + buf->slen), 1);
if (nread > 0)
buf->slen += nread;
} while (nread == 1);
fcntl(0, F_SETFL, oflags);
where oflags is a global variable containing the original flags for stdin (cached at the start of the program, just in case). This dumb way of doing it works as long as all of the data is present immediately, but fails otherwise. Because this sets read() to be non-blocking, it just returns -1 if there is no data. The program communicating with mine generally sends responses whenever it feels like it, and not all at once, so if the data is at all large this exits too early and fails.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data?
There always has to be a way to determinate the size. Otherwise, the program would require infinite memory, and thus impossible to run on a physical computer.
Think about it this way: even in the case of a never-ending stream of data, there must be some chunks or points where you have to process it. For instance, a live-streamed video has to decode a portion of it (e.g. a frame). Or a video game which processes messages one by one, even if the game has undetermined length.
This holds true regardless of the type of I/O you decide to use (blocking/non-blocking, synchronous/asynchronous...). For instance, if you want to use typical blocking synchronous I/O, what you have to do is process the data in a loop: each iteration, you read as much data as is available, and process as much as you can. Whatever you can not process (because you have not received enough yet), you keep for the next iteration. Then, the rest of the loop is the rest of the logic of the program.
In the end, regardless of what you do, you (or someone else, e.g. a library, the operating system, the hardware buffers...) have to buffer incoming data until it can be processed.
Basically, you have two choices -- synchronous or asynchronous -- and both have their advantages and disadvantages.
For synchronous, you need either delimeters or a length field embedded in the record (or fixed length records, but that is pretty inflexible). This works best for synchronous protocols like synchronous rpc or simplex client-server interactions where only one side talks at a time while the other side waits. For ASCII/text based protocols, it is common to use a control-character delimiter like NL/EOL or NUL or CTX to mark the end of messages. Binary protocols more commonly use an embedded length field -- the receiver first reads the length and then reads the full amount of (expected) data.
For asynchronous, you use non-blocking mode. It IS possible to use non-blocking mode with stdio streams, it just requires some care. out-of-data conditions show up to stdio like error conditions, so you need to use ferror and clearerr on the FILE * as appropriate.
It's possible for both to be used -- for example in client-server interactions, the clients may use synchronous (they send a request and wait for a reply) while the server uses asynchronous (to be be robust in the presence of misbehaving clients).
The read api on Linux or the ReadFile Api on windows will immediately return and not wait for the specified number of bytes to fill the buffer (when reading a pipe or socket). Read then reurns the number of bytes read.
This means, when reading from a pipe, you set a buffersize, read as much as returned and the process it. You then read the next bit. The only time you are blocked is if there is no data available at all.
This differs from fread which only returns once the desired number of bytes are returned or the stream determines doing so is impossible (like eof).
I have been working for two weeks on JamVM, a small but powerful Java Virtual Machine.
Now I am trying to figure how the memory is implemented and I am stuck on two C stupid problems:
char *mem = (char*)mmap(0, args->max_heap, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON, -1, 0);
--> The -1 parameter stands for a file descriptor, what does that mean? (I have aleady read the mmap man, but haven't found it, maybe I misunderstood...).
heapbase = (char*)(((uintptr_t)mem+HEADER_SIZE+OBJECT_GRAIN-1&)~(OBJECT_GRAIN-1)) HEADER_SIZE;
--> What is 1& ? I don't find it in the C specification...
Thanks,
Yann
In answer to your first question. From the man page.
fd should be a valid file descriptor, unless MAP_ANONYMOUS is set. If MAP_ANONYMOUS is set, then fd is ignored on Linux. However, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this.
So it's -1 because MAP_ANONYMOUS is being used.
You use the file descriptor when you have an open file that you want to map into memory. In this case, you're creating an anonymous map (one not backed by a file) so the file descriptor isn't needed. Some implementations ignore fd for anonymous maps, some require it to be -1.
The second question is a syntax error (probably a typo). It probably should be something like:
heapbase = (char*)(((uintptr_t)mem+HEADER_SIZE+OBJECT_GRAIN-1)
&~(OBJECT_GRAIN-1)) - HEADER_SIZE;
In that case, OBJECT_GRAIN will be a power of two and it's a way to get alignment to that power. For example, if it were 8, then ~(OBJECT_GRAIN-1) would be ~7 (~00...001112, which is ~11...110002) which, when ANDed with a value, could be used to force that value to the multiple-of-8 less than or equal to it.
In fact, it's definitely a transcription error somewhere (not necessarily you) because, when I download the JamVM from here and look in src/alloc.c, I get:
void initialiseAlloc(InitArgs *args) {
char *mem = (char*)mmap(0, args->max_heap, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANON, -1, 0);
:
<< a couple of irrelevant lines >>
:
/* Align heapbase so that start of heap + HEADER_SIZE is object aligned */
heapbase = (char*)(((uintptr_t)mem+HEADER_SIZE+OBJECT_GRAIN-1)&
~(OBJECT_GRAIN-1))-HEADER_SIZE;
(note that your version is also missing the - immediately before HEADER_SIZE, something else that points to transcription problems).
I implement game server where I need to both read and write. So I accept incoming connection and start reading from it using aio_read() but when I need to send something, I stop reading using aio_cancel() and then use aio_write(). Within write's callback I resume reading. So, I do read all the time but when I need to send something - I pause reading.
It works for ~20% of time - in other case call to aio_cancel() fails with "Operation now in progress" - and I cannot cancel it (even within permanent while cycle). So, my added write operation never happens.
How to use these functions well? What did I missed?
EDIT:
Used under Linux 2.6.35. Ubuntu 10 - 32 bit.
Example code:
void handle_read(union sigval sigev_value) { /* handle data or disconnection */ }
void handle_write(union sigval sigev_value) { /* free writing buffer memory */ }
void start()
{
const int acceptorSocket = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
memset(&addr, 0, sizeof(struct sockaddr_in));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = htons(port);
bind(acceptorSocket, (struct sockaddr*)&addr, sizeof(struct sockaddr_in));
listen(acceptorSocket, SOMAXCONN);
struct sockaddr_in address;
socklen_t addressLen = sizeof(struct sockaddr_in);
for(;;)
{
const int incomingSocket = accept(acceptorSocket, (struct sockaddr*)&address, &addressLen);
if(incomingSocket == -1)
{ /* handle error ... */}
else
{
//say socket to append outcoming messages at writing:
const int currentFlags = fcntl(incomingSocket, F_GETFL, 0);
if(currentFlags < 0) { /* handle error ... */ }
if(fcntl(incomingSocket, F_SETFL, currentFlags | O_APPEND) == -1) { /* handle another error ... */ }
//start reading:
struct aiocb* readingAiocb = new struct aiocb;
memset(readingAiocb, 0, sizeof(struct aiocb));
readingAiocb->aio_nbytes = MY_SOME_BUFFER_SIZE;
readingAiocb->aio_fildes = socketDesc;
readingAiocb->aio_buf = mySomeReadBuffer;
readingAiocb->aio_sigevent.sigev_notify = SIGEV_THREAD;
readingAiocb->aio_sigevent.sigev_value.sival_ptr = (void*)mySomeData;
readingAiocb->aio_sigevent.sigev_notify_function = handle_read;
if(aio_read(readingAiocb) != 0) { /* handle error ... */ }
}
}
}
//called at any time from server side:
send(void* data, const size_t dataLength)
{
//... some thread-safety precautions not needed here ...
const int cancellingResult = aio_cancel(socketDesc, readingAiocb);
if(cancellingResult != AIO_CANCELED)
{
//this one happens ~80% of the time - embracing previous call to permanent while cycle does not help:
if(cancellingResult == AIO_NOTCANCELED)
{
puts(strerror(aio_return(readingAiocb))); // "Operation now in progress"
/* don't know what to do... */
}
}
//otherwise it's okay to send:
else
{
aio_write(...);
}
}
If you wish to have separate AIO queues for reads and writes, so that a write issued later can execute before a read issued earlier, then you can use dup() to create a duplicate of the socket, and use one to issue reads and the other to issue writes.
However, I second the recommendations to avoid AIO entirely and simply use an epoll()-driven event loop with non-blocking sockets. This technique has been shown to scale to high numbers of clients - if you are getting high CPU usage, profile it and find out where that's happening, because the chances are that it's not your event loop that's the culprit.
First of all, consider dumping aio. There are lots of other ways to do asynchronous I/O that are not as braindead (yes, aio is breaindead). Lots of alternatives; if you're on linux you can use libaio (io_submit and friends). aio(7) mentions this.
Back to your question.
I haven't used aio in a long time but here's what I remember. aio_read and aio_write both put requests (aiocb) on some queue. They return immediately even if the requests will complete some time later. It's entirely possible to queue multiple requests without caring what happened to the earlier ones. So, in a nutshell: stop cancelling read requests and keep adding them.
/* populate read_aiocb */
rc = aio_read(&read_aiocb);
/* time passes ... */
/* populate write_aiocb */
rc = aio_write(&write_aiocb)
Later you're free to wait using aio_suspend, poll using aio_error, wait for signals etc.
I see you mention epoll in your comment. You should definitely go for libaio.
Unless I'm not mistaken, POSIX AIO (that is, aio_read(), aio_write() and so on) is guaranteed to work only on seekable file descriptors. From the aio_read() manpage:
The data is read starting at the absolute file offset aiocbp->aio_offset, regardless of the
current file position. After this request, the value of the current file position is unspeciā
fied.
For devices which do not have an associated file position such as network sockets, AFAICS, POSIX AIO is undefined. Perhaps it happens to work on your current setup, but that seems more by accident than by design.
Also, on Linux, POSIX AIO is implemented in glibc with the help of userspace threads.
That is, where possible use non-blocking IO and epoll(). However, epoll() does not work for seekable file descriptors such as regular files (same goes for the classical select()/poll() as well); in that case POSIX AIO is an alternative to rolling your own thread pool.
There should be no reason to stop or cancel an aio read or write request just because you need to make another read or write. If that were the case, that would defeat the whole point of asynchronous reading and writing since it's main purpose is to allow you to setup a reading or writing operation, and then move on. Since multiple requests can be queued, it would be much better to setup a couple of asynchronous reader/writer pools where you can grab a set of pre-initialized aiocb structures from an "available" pool that have been setup for asynchronous operations whenever you need them, and then return them to another "finished" pool when they're done and you can access the buffers they point to. While they're in the middle of an asynchronous read or write, they would be in a "busy" pool and wouldn't be touched. That way you won't have to keep creating aiocb structures on the heap dynamically every time you need to make a read or write operation, although that's okay to-do ... it's just not very efficient if you never plan on going over a certain limit, or plan to have only a certain number of "in-flight" requests.
BTW, keep in mind with a couple different in-flight asynchronous requests that your asychronous read/write handler can actually be interrupted by another read/write event. So you really don't want to be doing a whole-lot with your handler. In the above scenario I described, your handler would basically move the aiocb struct that triggered the signal handler from one of the pools to the next in the listed "available"->"busy"->"finished" stages. Your main code, after reading from the buffer pointed to by the aiocb structures in the "finished" pool would then move the structure back to the "available" pool.