Synchronize with sigev_notify_function() - c

I would like to read (asynchronously) BLOCK_SIZE bytes of one file, and the BLOCK_SIZE bytes of the second file, printing what has been read to the buffer as soon as the respective buffer has been filled. Let me illustrate what I mean:
// in main()
int infile_fd = open(infile_name, O_RDONLY); // add error checking
int maskfile_fd = open(maskfile_name, O_RDONLY); // add error checking
char* buffer_infile = malloc(BLOCK_SIZE); // add error checking
char* buffer_maskfile = malloc(BLOCK_SIZE); // add error checking
struct aiocb cb_infile;
struct aiocb cb_maskfile;
// set AIO control blocks
memset(&cb_infile, 0, sizeof(struct aiocb));
cb_infile.aio_fildes = infile_fd;
cb_infile.aio_buf = buffer_infile;
cb_infile.aio_nbytes = BLOCK_SIZE;
cb_infile.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_infile.aio_sigevent.sigev_notify_function = print_buffer;
cb_infile.aio_sigevent.sigev_value.sival_ptr = buffer_infile;
memset(&cb_maskfile, 0, sizeof(struct aiocb));
cb_maskfile.aio_fildes = maskfile_fd;
cb_maskfile.aio_buf = buffer_maskfile;
cb_maskfile.aio_nbytes = BLOCK_SIZE;
cb_maskfile.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_maskfile.aio_sigevent.sigev_notify_function = print_buffer;
cb_maskfile.aio_sigevent.sigev_value.sival_ptr = buffer_maskfile;
and the print_buffer() function is defined as follows:
void print_buffer(union sigval sv)
{
printf("%s\n", __func__);
printf("buffer address: %p\n", sv.sival_ptr);
printf("buffer: %.128s\n", (char*)sv.sival_ptr);
}
By the end of the program I do the usual clean up, i.e.
// clean up
close(infile_fd); // add error checking
close(maskfile_fd); // add error checking
free(buffer_infile);
printf("buffer_inline freed\n");
free(buffer_maskfile);
printf("buffer_maskfile freed\n");
The problem is, every once in a while buffer_inline gets freed before print_buffer manages to print its contents to the console. In a usual case I would employ some kind of pthread_join() but as far as I know this is impossible since POSIX does not specify that sigev_notify_function must be implemented using threads, and besides, how would I get the TID of such thread to call pthread_join() on?

Don't do it this way, if you can avoid it. If you can, just let process termination take care of it all.
Otherwise, the answer indicated in Andrew Henle's comment above is right on. You need to be sure that no more sigev_notify_functions will improperly reference the buffers.
The easiest way to do this is simply to countdown the number of expected notifications before freeing the buffers.
Note: your SIGEV_THREAD function is executed in a separate thread, though not necessarily a new thread each time. (POSIX.1-2017 System Interfaces §2.4.2) Importantly, you are not meant to manage this thread's lifecycle: it is detached by default, with PTHREAD_CREATE_JOINABLE explicitly noted as undefined behavior.
As an aside, I'd suggest never using SIGEV_THREAD in robust code. Per spec, the signal mask of the sigev_notify_function thread is implementation-defined. Yikes. For me, that makes it per se unreliable. In my view, SIGEV_SIGNAL and a dedicated signal-handling thread are much safer.

Related

copy_to_user returns an error in a char device read function

I've implemented a char device for my kernel module and implemented a read function for it. The read function calls copy_to_user to return data to the caller. I've originally implemented the read function in a blocking manner (with wait_event_interruptible) but the problem reproduces even when I implement read in a non-blocking manner. My code is running on a MIPS procesor.
The user space program opens the char device and reads into a buffer allocated on the stack.
What I've found is that occasionally copy_to_user will fail to copy any bytes. Moreover, even if I replace copy_to_user with a call to memcpy (only for the purposes of checking... I know this isn't the right thing to do), and print out the destination buffer immediately afterwards, I see that memcpy has failed to copy any bytes.
I'm not really sure how to further debug this - how can I determine why memory is not being copied? Is it possible that the process context is wrong?
EDIT: Here's some pseudo-code outlining what the code currently looks like:
User mode (runs repeatedly):
char buf[BUF_LEN];
FILE *f = fopen(char_device_file, "rb");
fread(buf, 1, BUF_LEN, f);
fclose(f);
Kernel mode:
char_device =
create_char_device(char_device_name,
NULL,
read_func,
NULL,
NULL);
int read_func(char *output_buffer, int output_buffer_length, loff_t *offset)
{
int rc;
if (*offset == 0)
{
spin_lock_irqsave(&lock, flags);
while (get_available_bytes_to_read() == 0)
{
spin_unlock_irqrestore(&lock, flags);
if (wait_event_interruptible(self->wait_queue, get_available_bytes_to_read() != 0))
{
// Got a signal; retry the read
return -ERESTARTSYS;
}
spin_lock_irqsave(&lock, flags);
}
rc = copy_to_user(output_buffer, internal_buffer, bytes_to_copy);
spin_unlock_irqrestore(&lock, flags);
}
else rc = 0;
return rc;
}
It took quite a bit of debugging, but in the end Tsyvarev's hint (the comment about not calling copy_to_user with a spinlock taken) seems to have been the cause.
Our process had a background thread which occasionally launched a new process (fork + exec). When we disabled this thread, everything worked well. The best theory we have is that the fork made all of our memory pages copy-on-write, so when we tried to copy to them, the kernel had to do some work which could not be done with the spinlock taken. Hopefully it at least makes some sense (although I'd have guessed that this would apply only to the child process, and the parent's process pages would simply remain writable, but who knows...).
We rewrote our code to be lockless and the problem disappeared.
Now we just need to verify that our lockless code is indeed safe on different architectures. Easy as pie.

read on many real file descriptors

Working on a Linux (Ubuntu) application. I need to read many files in a non-blocking fashion. Unfortunately epoll doesn't support real file descriptor (file descriptor from file), it does support file descriptor that's network socket. select does work on real file descriptors, but it has two drawbacks, 1) it's slow, linearly go through all the file descriptors that are set, 2) it's limited, it typically won't allow more than 1024 file descriptors.
I can change each file descriptors to be non-blocking and use non-blocking "read" to poll, but it's very expensive especially when there are a large number of file descriptors.
What are the options here?
Thanks.
Update 1
The use case here is to create some sort of file server, with many clients requesting for files, serve them in a non-blocking fashion. Due to network side implementation (not standard TCP/IP stack), can't use sendfile().
You could use multiple select calls combined with either threading or forking. This would reduce the number of FD_ISSET calls per select set.
Perhaps you can provide more details about your use-case. It sounds like you are using select to monitor file changes, which doesn't work as you would expect with regular files. Perhaps you are simply looking for flock
You could use Asynchronous IO on Linux. The relevant AIO manpages (all in section 3) appear to have quite a bit of information. I think that aio_read() would probably be the most useful for you.
Here's some code that I believe you should be able to adapt for your usage:
...
#define _GNU_SOURCE
#include <aio.h>
#include <unistd.h>
typedef struct {
struct aiocb *aio;
connection_data *conn;
} cb_data;
void callback (union sigval u) {
// recover file related data prior to freeing
cb_data data = u.sival_ptr;
int fd = data->aio->aio_fildes;
uint8_t *buffer = data->aio->aio_buf;
size_t len = data->aio->aio_nbytes;
free (data->aio);
// recover connection data pointer then free
connection_data *conn = data->conn;
free (data);
...
// finish handling request
...
return;
}
...
int main (int argc, char **argv) {
// initial setup
...
// setup aio for optimal performance
struct aioinit ainit = { 0 };
// online background threads
ainit.aio_threads = sysconf (_SC_NPROCESSORS_ONLN) * 4;
// use defaults if using few core system
ainit.aio_threads = (ainit.aio_threads > 20 ? ainit.aio_threads : 20)
// set num to the maximum number of likely simultaneous requests
ainit.aio_num = 4096;
ainit.aio_idle_time = 5;
aio_init (&ainit);
...
// handle incoming requests
int exit = 0;
while (!exit) {
...
// the [asynchronous] fun begins
struct aiocb *cb = calloc (1, sizeof (struct aiocb));
if (!cb)
// handle OOM error
cb->aio_fildes = file_fd;
cb->aio_offset = 0; // assuming you want to send the entire file
cb->aio_buf = malloc (file_len);
if (!cb->aio_buf)
// handle OOM error
cb->aio_nbytes = file_len;
// execute the callback in a separate thread
cb->aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_data *data = malloc (sizeof (cb_data));
if (!data)
// handle OOM error
data->aio = cb; // so we can free() later
// whatever you need to finish handling the request
data->conn = connection_data;
cb->aio_sigevent.sigev_value.sival_ptr = data; // passed to callback
cb->aio_sigevent.sigev_notify_function = callback;
if ((err = aio_read (cb))) // and you're done!
// handle aio error
// move on to next connection
}
...
return 0;
}
This will result in you no longer having to wait on files being read in your main thread. Of course, you can create more performant systems using AIO, but those are naturally likely to be more complex and this should work for a basic use case.

pthread POSIX C library detachstate

I was asked from where do we know that when passing NULL as a second argument in pthread_create() function the thread is made joinable.
I mean, I know that man pages state so, but a justification in code was demanded.
I know that when NULL is passed in, default attributes are used:
const struct pthread_attr *iattr = (struct pthread_attr *) attr;
if (iattr == NULL)
/* Is this the best idea? On NUMA machines this could mean accessing far-away memory. */
iattr = &default_attr;
I know that it should be somewhere in the code of pthread library, but I don't know where exactly.
I know that the definition of default_attr is in pthread_create.c:
static const struct pthread_attr default_attr = { /* Just some value > 0 which gets rounded to the nearest page size. */ .guardsize = 1, };
http://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_create.c;h=4fe0755079e5491ad360c3b4f26c182543a0bd6e;hb=HEAD#l457
but I do not know where is exactly stated in the code that this result in a joinable thread.
Thanks in advance.
First off, from the code you pasted you can see that default_attr contains zeroes in almost all fields (there's no such thing as a half-initialized variable in C: if you only initialize some fields, the others are set to 0).
Second, pthread_create contains this code:
/* Initialize the field for the ID of the thread which is waiting
for us. This is a self-reference in case the thread is created
detached. */
pd->joinid = iattr->flags & ATTR_FLAG_DETACHSTATE ? pd : NULL;
This line checks whether iattr->flags has the ATTR_FLAG_DETACHSTATE bit set, which (for default_attr) it doesn't because default_attr.flags is 0. Thus it sets pd->joinid to NULL and not to pd as for detached threads.
(Note that this answer only applies to GNU glibc and not to POSIX pthreads in general.)

Looking for the right ring buffer implementation in C

I am looking for a ring buffer implementation (or pseudocode) in C with the following characteristics:
multiple producer single consumer pattern (MPSC)
consumer blocks on empty
producers block on full
lock-free (I expect high contention)
So far I've been working only with SPSC buffers - one per producer - but I would like to avoid the continuous spinning of the consumer to check for new data over all its input buffers (and maybe to get rid of some marshaling threads in my system).
I develop for Linux on Intel machines.
See liblfds, a lock-free MPMC ringbuffer. It won't block at all—lock-free data structures don't tend to do this, because the point of being lock-free is to avoid blocking; you need to handle this, when the data structure comes back to you with a NULL—returns NULL if you try to read on empty, but doesn't match your requirement when writing on full; here, it will throw away the oldest element and give you that for your write.
However, it would only take a small modification to obtain that behaviour.
But there may be a better solution. The tricky part of a ringbuffer is when full getting the oldest previous element and re-using that. You don't need this. I think you could take the SPSC memory-barrier only circular buffer and rewrite it using atomic operations. That will be a lot more performant that the MPMC ringbuffer in liblfds (which is a combination of a queue and a stack).
I think I have what you are looking for. It is a lock free ring buffer implementation that blocks producer/consumer. You only need access to atomic primitives - in this example I will use gcc's sync functions.
It has a known bug - if you overflow the buffer by more than 100% it is not guaranteed that the queue remains FIFO (it will still process them all eventually).
This implementation relies on reading/writing the buffer elements as being an atomic operation (which is pretty much guaranteed for pointers)
struct ringBuffer
{
void** buffer;
uint64_t writePosition;
size_t size;
sem_t* semaphore;
}
//create the ring buffer
struct ringBuffer* buf = calloc(1, sizeof(struct ringBuffer));
buf->buffer = calloc(bufferSize, sizeof(void*));
buf->size = bufferSize;
buf->semaphore = malloc(sizeof(sem_t));
sem_init(buf->semaphore, 0, 0);
//producer
void addToBuffer(void* newValue, struct ringBuffer* buf)
{
uint64_t writepos = __sync_fetch_and_add(&buf->writePosition, 1) % buf->size;
//spin lock until buffer space available
while(!__sync_bool_compare_and_swap(&(buf->buffer[writePosition]), NULL, newValue));
sem_post(buf->semaphore);
}
//consumer
void processBuffer(struct ringBuffer* buf)
{
uint64_t readPos = 0;
while(1)
{
sem_wait(buf->semaphore);
//process buf->buffer[readPos % buf->size]
buf->buffer[readPos % buf->size] = NULL;
readPos++;
}
}

C - How to use both aio_read() and aio_write()

I implement game server where I need to both read and write. So I accept incoming connection and start reading from it using aio_read() but when I need to send something, I stop reading using aio_cancel() and then use aio_write(). Within write's callback I resume reading. So, I do read all the time but when I need to send something - I pause reading.
It works for ~20% of time - in other case call to aio_cancel() fails with "Operation now in progress" - and I cannot cancel it (even within permanent while cycle). So, my added write operation never happens.
How to use these functions well? What did I missed?
EDIT:
Used under Linux 2.6.35. Ubuntu 10 - 32 bit.
Example code:
void handle_read(union sigval sigev_value) { /* handle data or disconnection */ }
void handle_write(union sigval sigev_value) { /* free writing buffer memory */ }
void start()
{
const int acceptorSocket = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
memset(&addr, 0, sizeof(struct sockaddr_in));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = htons(port);
bind(acceptorSocket, (struct sockaddr*)&addr, sizeof(struct sockaddr_in));
listen(acceptorSocket, SOMAXCONN);
struct sockaddr_in address;
socklen_t addressLen = sizeof(struct sockaddr_in);
for(;;)
{
const int incomingSocket = accept(acceptorSocket, (struct sockaddr*)&address, &addressLen);
if(incomingSocket == -1)
{ /* handle error ... */}
else
{
//say socket to append outcoming messages at writing:
const int currentFlags = fcntl(incomingSocket, F_GETFL, 0);
if(currentFlags < 0) { /* handle error ... */ }
if(fcntl(incomingSocket, F_SETFL, currentFlags | O_APPEND) == -1) { /* handle another error ... */ }
//start reading:
struct aiocb* readingAiocb = new struct aiocb;
memset(readingAiocb, 0, sizeof(struct aiocb));
readingAiocb->aio_nbytes = MY_SOME_BUFFER_SIZE;
readingAiocb->aio_fildes = socketDesc;
readingAiocb->aio_buf = mySomeReadBuffer;
readingAiocb->aio_sigevent.sigev_notify = SIGEV_THREAD;
readingAiocb->aio_sigevent.sigev_value.sival_ptr = (void*)mySomeData;
readingAiocb->aio_sigevent.sigev_notify_function = handle_read;
if(aio_read(readingAiocb) != 0) { /* handle error ... */ }
}
}
}
//called at any time from server side:
send(void* data, const size_t dataLength)
{
//... some thread-safety precautions not needed here ...
const int cancellingResult = aio_cancel(socketDesc, readingAiocb);
if(cancellingResult != AIO_CANCELED)
{
//this one happens ~80% of the time - embracing previous call to permanent while cycle does not help:
if(cancellingResult == AIO_NOTCANCELED)
{
puts(strerror(aio_return(readingAiocb))); // "Operation now in progress"
/* don't know what to do... */
}
}
//otherwise it's okay to send:
else
{
aio_write(...);
}
}
If you wish to have separate AIO queues for reads and writes, so that a write issued later can execute before a read issued earlier, then you can use dup() to create a duplicate of the socket, and use one to issue reads and the other to issue writes.
However, I second the recommendations to avoid AIO entirely and simply use an epoll()-driven event loop with non-blocking sockets. This technique has been shown to scale to high numbers of clients - if you are getting high CPU usage, profile it and find out where that's happening, because the chances are that it's not your event loop that's the culprit.
First of all, consider dumping aio. There are lots of other ways to do asynchronous I/O that are not as braindead (yes, aio is breaindead). Lots of alternatives; if you're on linux you can use libaio (io_submit and friends). aio(7) mentions this.
Back to your question.
I haven't used aio in a long time but here's what I remember. aio_read and aio_write both put requests (aiocb) on some queue. They return immediately even if the requests will complete some time later. It's entirely possible to queue multiple requests without caring what happened to the earlier ones. So, in a nutshell: stop cancelling read requests and keep adding them.
/* populate read_aiocb */
rc = aio_read(&read_aiocb);
/* time passes ... */
/* populate write_aiocb */
rc = aio_write(&write_aiocb)
Later you're free to wait using aio_suspend, poll using aio_error, wait for signals etc.
I see you mention epoll in your comment. You should definitely go for libaio.
Unless I'm not mistaken, POSIX AIO (that is, aio_read(), aio_write() and so on) is guaranteed to work only on seekable file descriptors. From the aio_read() manpage:
The data is read starting at the absolute file offset aiocbp->aio_offset, regardless of the
current file position. After this request, the value of the current file position is unspeci‐
fied.
For devices which do not have an associated file position such as network sockets, AFAICS, POSIX AIO is undefined. Perhaps it happens to work on your current setup, but that seems more by accident than by design.
Also, on Linux, POSIX AIO is implemented in glibc with the help of userspace threads.
That is, where possible use non-blocking IO and epoll(). However, epoll() does not work for seekable file descriptors such as regular files (same goes for the classical select()/poll() as well); in that case POSIX AIO is an alternative to rolling your own thread pool.
There should be no reason to stop or cancel an aio read or write request just because you need to make another read or write. If that were the case, that would defeat the whole point of asynchronous reading and writing since it's main purpose is to allow you to setup a reading or writing operation, and then move on. Since multiple requests can be queued, it would be much better to setup a couple of asynchronous reader/writer pools where you can grab a set of pre-initialized aiocb structures from an "available" pool that have been setup for asynchronous operations whenever you need them, and then return them to another "finished" pool when they're done and you can access the buffers they point to. While they're in the middle of an asynchronous read or write, they would be in a "busy" pool and wouldn't be touched. That way you won't have to keep creating aiocb structures on the heap dynamically every time you need to make a read or write operation, although that's okay to-do ... it's just not very efficient if you never plan on going over a certain limit, or plan to have only a certain number of "in-flight" requests.
BTW, keep in mind with a couple different in-flight asynchronous requests that your asychronous read/write handler can actually be interrupted by another read/write event. So you really don't want to be doing a whole-lot with your handler. In the above scenario I described, your handler would basically move the aiocb struct that triggered the signal handler from one of the pools to the next in the listed "available"->"busy"->"finished" stages. Your main code, after reading from the buffer pointed to by the aiocb structures in the "finished" pool would then move the structure back to the "available" pool.

Resources