C-library invoking a kdb function from a BG-thread - c

I have an external C-library for asynchronously consuming data via callback functions on a background thread; I want to receive the data and process it in a q process. Following code.kx.com's Interfacing with C documentation, I made a small C-library of glue code converting the inbound source data into k structures and dispatching it to my q process vi sd1/sd0 calls so that the q function gets invoked on q's thread context. The program successfully invokes the initial callback then hangs.
I've stripped down the program to what I think is the bare minimum to simply demonstrate a C BG-thread callback into a q function, but I'm not sure if I've stripped away too much. For example, sd1 accepts a FD and a C-callback. My bare minimum FD is created via eventfd(), which is used for subsequent sd1/sd0 calls. I have tried invoking read and write on the FD, and not doing any IO over the FD, either way the program hangs.
Here's my bare-bones C-library:
/* testlib.c */
#define KXVER 3
#include "k.h"
#include <pthread.h>
#include <sys/eventfd.h>
I d;
pthread_t tid;
K qdisp(I d)
{
K ignored = k(0, (S)"onCB", kj(54321), (K)0);
sd0(d);
return (K)0;
}
void* loop(void* vargs)
{
while(1) {
sleep(1);
sd1(d, qdisp);
}
return NULL;
}
K init(K ignore)
{
d = eventfd(1, 0);
int err = pthread_create(&tid, NULL, &loop, NULL);
return (K)0;
}
And here's the q script that invokes it:
/ testlib.q
init:`testlib 2:(`init;1)
onCB:{ 0N!x }
init[`blah]
Any tips or comments appreciated.

For those interested, it looks like sd1 schedules a function to be invoked every time there is data available to be read on a file descriptor, and sd0 removes the scheduled function from invocation.
So the idea is to write a function that attempts to read from the FD; if successful, invoke your q function via k() and return the result, if 0 just return 0, and if error call sd0.
#define KXVER 3
#include "k.h"
#include <pthread.h>
#include <sys/eventfd.h>
#include <stdio.h>
I d;
pthread_t tid;
K qdisp(I d)
{
J v;
if (-1 != read(d, &v, sizeof(J)) ) {
return k(0, "onCB", ki(v), (K)0);
}
sd0(d);
return (K)0;
}
void* loop(void* vargs)
{
J j = 0;
sd1(d, qdisp);
while(j++) {
sleep(1);
write(d, &j, sizeof(J));
}
return NULL;
}
K init(K cb)
{
d = eventfd(1, 0);
int err = pthread_create(&tid, NULL, &loop, NULL);
return (K)0;
}

Are you sure the other side provides enough data? As I see from the documentation referenced by you, communication is done by using blocking pipes. This means that if there is not enough data, or the buffer is not flushed by the producer, you application should block and this is its intended behavior.
You could try to use sd0()/sd1() skipping the d parameter to verify that the observed behavior is caused by blocking pipe and not by something else.

Related

Callbacks in AIO asynchronous I/O

I have found discussion on using callbacks in AIO asynchronous I/O on the internet. However, what I have found has left me confused. An example code is listed below from a site on Linux AIO. In this code, AIO is being used to read in the contents of a file. My problem is that it seems to me that a code that actually processes the contents of that file must have some point where some kind of block is made to the execution until the read is completed. This code here has no block like that at all. I was expecting to see some kind of call analogous to pthread_mutex_lock in pthread programming. I suppose I could put in a dummy loop after the aio_read() call that would block execution until the read is completed. But that puts me right back to the simplest way of blocking the execution, and then I don't see what is gained by all the coding overhead that goes into establishing a callback. I am obviously missing something. Could someone tell me what it is?
Here is the code. (BTW, the original is in C++; I have adapted it to C.)
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <aio.h>
//#include <bits/stdc++.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
const int BUFSIZE = 1024;
void aio_completion_handler(sigval_t sigval)
{
struct aiocb *req;
req = (struct aiocb *)sigval.sival_ptr; //Pay attention here.
/*Check again if the asynchrony is complete?*/
if (aio_error(req) == 0)
{
int ret = aio_return(req);
printf("ret == %d\n", ret);
printf("%s\n", (char *)req->aio_buf);
}
close(req->aio_fildes);
free((void *)req->aio_buf);
while (1)
{
printf("The callback function is being executed...\n");
sleep(1);
}
}
int main(void)
{
struct aiocb my_aiocb;
int fd = open("file.txt", O_RDONLY);
if (fd < 0)
perror("open");
bzero((char *)&my_aiocb, sizeof(my_aiocb));
my_aiocb.aio_buf = malloc(BUFSIZE);
if (!my_aiocb.aio_buf)
perror("my_aiocb.aio_buf");
my_aiocb.aio_fildes = fd;
my_aiocb.aio_nbytes = BUFSIZE;
my_aiocb.aio_offset = 0;
//Fill in callback information
/*
Using SIGEV_THREAD to request a thread callback function as a notification method
*/
my_aiocb.aio_sigevent.sigev_notify = SIGEV_THREAD;
my_aiocb.aio_sigevent.sigev_notify_function = aio_completion_handler;
my_aiocb.aio_sigevent.sigev_notify_attributes = NULL;
/*
The context to be transmitted is loaded into the handler (in this case, a reference to the aiocb request itself).
In this handler, we simply refer to the arrived sigval pointer and use the AIO function to verify that the request has been completed.
*/
my_aiocb.aio_sigevent.sigev_value.sival_ptr = &my_aiocb;
int ret = aio_read(&my_aiocb);
if (ret < 0)
perror("aio_read");
/* <---- A real code would process the data read from the file.
* So execution needs to be blocked until it is clear that the
* read is complete. Right here I could put in:
* while (aio_error(%my_aiocb) == EINPROGRESS) {}
* But is there some other way involving a callback?
* If not, what has creating a callback done for me?
*/
//The calling process continues to execute
while (1)
{
printf("The main thread continues to execute...\n");
sleep(1);
}
return 0;
}

libuv simple send udp

I'm doing a multiplatform shared library in C, which sends UDP messages using libuv, however I don't know much about libuv and I don't know if my implementation is good, or if there is another solution besides libuv.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <uv.h>
#define IP "0.0.0.0"
#define PORT 8090
#define STR_BUFFER 256
void on_send(uv_udp_send_t *req, int status) {
if (status) {
fprintf(stderr, "Send error %s\n", uv_strerror(status));
return;
}
}
int send_udp(char *msg){
uv_loop_t *loop = malloc(sizeof(uv_loop_t));
uv_loop_init(loop);
uv_udp_t send_socket;
uv_udp_init(loop, &send_socket);
struct sockaddr_in send_addr;
uv_ip4_addr(IP, PORT, &send_addr);
uv_udp_bind(&send_socket, (const struct sockaddr*)&send_addr, 0);
char buff[STR_BUFFER];
memset(buff,0,STR_BUFFER);
strcpy(buff,msg);
uv_buf_t buffer = uv_buf_init(buff,STR_BUFFER);
uv_udp_send_t send_req;
uv_udp_send(&send_req, &send_socket, &buffer, 1, (const struct sockaddr*)&send_addr, on_send);
uv_run(loop, UV_RUN_ONCE);
uv_loop_close(loop);
free(loop);
return 0;
}
int main() {
send_udp("test 123\n");
return 0;
}
Your implementation has multiple issues to date:
I'm not sure a single loop iteration is enough to send an UDP message on every platform. This is something you can check easily with the value returned by uv_run, see the documentation for uv_run when using the UV_RUN_ONCE mode:
UV_RUN_ONCE: Poll for i/o once. Note that this function blocks if there are no pending callbacks. Returns zero when done (no active handles or requests left), or non-zero if more callbacks are expected (meaning you should run the event loop again sometime in the future).
If you would keep your code as-is, I would suggest to do at least this:
int done;
do {
done = uv_run(loop, UV_RUN_ONCE);
} while (done != 0);
But keep on reading, you can do even better ! :)
It's quite costly in terms of performance, uv_loops are supposed to be long lasting, not to be created for each message sent.
Incomplete error handling: uv_udp_bind, uv_udp_send, ... they can fail !
How to improve
I would suggest you to change your code for one of the two following solutions:
Your library is used in a libuv context (a.k.a, you don't try to hide the libuv implementation detail but require all people who wish to use your library to use libuv explicitly.
You could then change your function signature to something like int send_udp(uv_loop_t *loop, char *msg) and let the library users manage the event loop and run it.
Your library uses libuv as an implementation detail: you don't want to bother your library users with libuv, therefore its your reponsibility to provide robust and performant code. This is how I would do it:
mylib_init: starts a thread and run an uv_loop on it
send_udp: push the message on a queue (beware of thread-safety), notify your loop it has a message to send (you can use uv_async for this), then you can send the message with approximately the same code you are already using.
mylib_shutdown: stop the loop and the thread (again, you can use an uv_async to call uv_stop from the right thread)
It would look like this (I don't have a compiler to test, but you'll have most of the work done):
static uv_thread_t thread; // our network thread
static uv_loop_t loop; // the loop running on the thread
static uv_async_t notify_send; // to notify the thread it has messages to send
static uv_async_t notify_shutdown; // to notify the thread it must shutdown
static queue_t buffer_queue; // a queue of messages to send
static uv_mutex_t buffer_queue_mutex; // to sync access to the queue from the various threads
static void thread_entry(void *arg);
static void on_send_messages(uv_async_t *handle);
static void on_shutdown(uv_async_t *handle);
int mylib_init() {
// will call thread_entry on a new thread, our network thread
return uv_thread_create(&thread, thread_entry, NULL);
}
int send_udp(char *msg) {
uv_mutex_lock(&buffer_queue_mutex);
queue_enqueue(&buffer_queue, strdup(msg)); // don't forget to free() after sending the message
uv_async_send(&notify_send);
uv_mutex_unlock(&buffer_queue_mutex);
}
int mylib_shutdown() {
// will call on_shutdown on the loop thread
uv_async_send(&notify_shutdown);
// wait for the thread to stop
return uv_thread_join(&thread);
}
static void thread_entry(void *arg) {
uv_loop_init(&loop);
uv_mutex_init_recursive(&buffer_queue_mutex);
uv_async_init(&loop, &notify_send, on_send_messages);
uv_async_init(&loop, &notify_shutdown, on_shutdown);
uv_run(&loop, UV_RUN_DEFAULT); // this code will not return until uv_stop is called
uv_mutex_destroy(&buffer_queue_mutex);
uv_loop_close(&loop);
}
static void on_send_messages(uv_async_t *handle) {
uv_mutex_lock(&buffer_queue_mutex);
char *msg = NULL;
// for each member of the queue ...
while (queue_dequeue(&buffer_queue, &msg) == 0) {
// create a uv_udp_t, send the message
}
uv_mutex_unlock(&buffer_queue_mutex);
}
static void on_shutdown(uv_async_t *handle) {
uv_stop(&loop);
}
It's up to you to develop or find a queue implementation ;)
Usage
int main() {
mylib_init();
send_udp("my super message");
mylib_shutdown();
}

shm_open: Differences between Mac and Linux

I have a queue in shared memory. It does work on Linux (kernel 4.3.4), but not on Mac OS X. Are there any differences between how Mac OS X handles shared memory and how linux does, which may explain this?
I get the shared memory via:
int sh_fd = shm_open(shmName, O_RDWR | O_CREAT,
S_IROTH | S_IWOTH // others hav read/write permission
| S_IRUSR | S_IWUSR // I have read/write permission
);
// bring the shared memory to the desired size
ftruncate(sh_fd, getpagesize());
The queue is very simple as well. Here is the basic struct:
typedef struct {
// this is to check whether the queue is initialized.
// on linux, this will be 0 initially
bool isInitialized;
// mutex to protect concurrent access
pthread_mutex_t access;
// condition for the reader, readers should wait here
pthread_cond_t reader;
// condition for the writer, writers should wait here
pthread_cond_t writer;
// whether the queue can still be used.
bool isOpen;
// maximum capacity of the queue.
int32_t capacity;
// current position of the reader and number of items.
int32_t readPos, items;
// entries in the queue. The array actually is longer, which means it uses the space behind the struct.
entry entries[1];
} shared_queue;
Basically everyone who wants access acquires the mutex, readPos indicates where the next value should be read (incrementing readPos afterwards), (readPos+items) % capacity is where new items go. The only somewhat fancy trick is the isInitialized byte. ftruncate fills the shared memory with zeros if it had length 0 before, so I rely on isInitiualized to be zero on a fresh shared memory page and write a 1 there as soon as I initialize the struct.
As I said, it works on Linux, so I don't think it is a simple implementation bug. Is there any subtle difference between shm_open on Mac vs. Linux which I may not be aware of? The bug I see looks like the reader tries to read from an empty queue, so, maybe the pthread mutex/condition does not work on shared memory in a Mac?
The problem is that PTHREAD_PROCESS_SHARED is not supported on mac.
http://alesteska.blogspot.de/2012/08/pthreadprocessshared-not-supported-on.html
You must set PTHREAD_PROCESS_SHARED on both the mutex and condition variables.
So for a mutex:
pthread_mutexattr_t mutex_attr;
pthread_mutex_t the_mutex;
pthread_mutexattr_init(&mutex_attr);
pthread_mutexattr_setpshared(&mutex_attr, PTHREAD_PROCESS_SHARED);
pthread_mutexattr(&the_mutex, &mutex_attr);
Basically the same steps for the condition variables, but replace mutexattr with condattr.
If the the pthread_*attr_setpshared functions don't exist or return an error, then it may not be supported on your platform.
To be on the safe side, you might want to set PTHREAD_MUTEX_ROBUST if supported. This will prevent deadlock over the mutex (though not guarantee queue consistency) if a process exits while holding the lock.
EDIT: As an added caution, having a boolean "is initialized" flag is an insufficient plan on its own. You need more than that to really guarantee only one process can initialize the structure. At the very least you need to do:
// O_EXCL means this fails if not the first one here
fd = shm_open(name, otherFlags | O_CREAT | O_EXCL );
if( fd != -1 )
{
// initialize here
// Notify everybody the mutex has been initialized.
}
else
{
fd = shm_open(name, otherFlags ); // NO O_CREAT
// magically somehow wait until queue is initialized.
}
Are you sure really need to roll your own queue? Will POSIX message queues (see mq_open man page) do the job? If not, what about one of many messaging middleware solutions out there?
Update 2016-Feb-10: Possible mkfifo based solution
One alternative to implementing your own queue in shared memory is to use an OS provided named FIFO using mkfifo. A key difference between a FIFO and a named pipe is that you are allowed to have multiple simultaneous readers and writers.
A "catch" to this, is that the reader sees end-of-file when the last writer exits, so if you want readers to go indefinitely, you may need to open a dummy write handle.
FIFOs are super easy to use on the command line, like so:
reader.sh
mkfifo my_queue
cat my_queue
write.sh
echo "hello world" > my_queue
Or slightly more effort in C:
reader.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
int main(int argc, char**argv)
{
FILE * fifo;
FILE * wfifo;
int res;
char buf[1024];
char * linePtr;
/* Try to create the queue. This may belong on reader or writer side
* depending on your setup. */
if( 0 != mkfifo("work_queue", S_IRUSR | S_IWUSR ) )
{
if( errno != EEXIST )
{
perror("mkfifo:");
return -1;
}
}
/* Get a read handle to the queue */
fifo = fopen("work_queue", "r");
/* Get a write handle to the queue */
wfifo = fopen("work_queue", "w");
if( !fifo )
{
perror("fopen: " );
return -1;
}
while(1)
{
/* pull a single message from the queue at a time */
linePtr = fgets(buf, sizeof(buf), fifo);
if( linePtr )
{
fprintf(stdout, "new command=%s\n", linePtr);
}
else
{
break;
}
}
return 0;
}
writer.c
#include <stdio.h>
#include <unistd.h>
int main(int argc, char**argv)
{
FILE * pipe = fopen("work_queue", "w");
unsigned int job = 0;
int my_pid = getpid();
while(1)
{
/* Write one 'entry' to the queue */
fprintf(pipe, "job %u from %d\n", ++job, my_pid);
}
}

Raw Clone system call

I am trying to use the raw clone system, but I could not find any proper documentation.
I tried to write a small program to try it, but this ends up with a segmentation fault.
I cannot understand where I am wrong.
here is the small application :
define STACK_SIZE 0x10000
define BUFSIZE 200
#define _GNU_SOURCE
void hello (){
fprintf(stderr,"Hello word\n");
_exit(0);
}
int main()
{
int res;
void *stack = mmap(0, STACK_SIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
pid_t ptid, tid;
printf("Stack %p\n", stack + STACK_SIZE);
memset(stack, 0, STACK_SIZE);
res= syscall(SYS_clone,CLONE_SIGHAND|CLONE_FS|CLONE_VM|CLONE_FILES,stack + STACK_SIZE, &tid,&ptid,NULL );
if (!res)
hello();
printf("Clone result %x\n", res);
waitpid(-1, NULL, __WALL);
return 0;
}
The child pops the return address from the empty stack, reading from an unmapped address.
NB: The answer assumes x86_64 ISA, where call places the return address on a stack. Contrast it with AArch64, where the Link Register may hold the return address instead.
Details
According to the man page for clone, "execution in the child continues from the point of the call". Yet, although the child and the parent initially execute the same instructions, they don't share the stack, and a ret instruction behaves differently in the parent and child.
Running the application under GDB reveals that SIGSEGV occurs in the syscall function at ret instruction.
Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to LWP 415]
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:40
(gdb) x/i $pc
=> 0x7ffff7ee0745 <syscall+37>: retq
ret pops the return address from the stack. The child, with its empty stack, fails to execute it, while the parent successfully returns to the main function.
You can fix the unmapped access by supplying stack + STACK_SIZE - 8 as the third syscall argument. However, the underlying problem remains: the zero-initialized stack doesn't store the return address. The following image illustrates the problem .
syscall+... means "somewhere in syscall", so as not to clutter the scheme with addresses.
Solution
Macattack's answer already mentioned the wrapper for the clone. Implement the arch-specific wrapper in the vein of the uclibc one.
I can't say I recommend going with clone if you can use pthreads. I've had bad experience with functions such as malloc() in relation to clone.
Have you looked at the man page for documentation?
Here is an example that runs for me. I didn't really examine your code to see why it might be crashing.
#define _GNU_SOURCE
#include <stdio.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <linux/sched.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>
// Allow us to round to page size
#define ROUND_UP_TO_MULTIPLE(a,b) \
( ( (a) % (b) == 0) ? (a) : ( (a) + ( (b) - ( (a) % (b) ) ) ) )
struct argsy {
int threadnum;
};
int fun(void * args) {
struct argsy * arguments = (struct argsy *) args;
fprintf(stderr, "hey!, i'm thread %d\n", arguments->threadnum);
return 0;
}
#define N_THREADS 10
#define PAGESIZE 4096
struct argsy arguments[N_THREADS];
int main() {
assert(PAGESIZE==getpagesize());
const int thread_stack_size = 256*PAGESIZE;
void * base = malloc((((N_THREADS*thread_stack_size+PAGESIZE)/PAGESIZE)*PAGESIZE));
assert(base);
void * stack = (void *)ROUND_UP_TO_MULTIPLE((size_t)(base), PAGESIZE);
int i = 0;
for (i = 0; i < N_THREADS; i++) {
void * args = &arguments[i];
arguments[i].threadnum = i;
clone(&fun, stack+((i+1)*thread_stack_size),
CLONE_FILES | CLONE_VM,
args);
}
sleep(1);
// Wait not implemented
return 0;
}

Linux timerfd> calling a function every x seconds without blocking the code execution

Need to call a function every X (let's say 5) seconds and the below code does it.
But it is blocking the execution of code. As I want it to work like setitimer(), where I can (for example) call a function every 5 sec and do something else.
#include <sys/timerfd.h>
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> /* Definition of uint64_t */
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
int
main(int argc, char *argv[])
{
struct itimerspec new_value;
int max_exp, fd;
struct timespec now;
uint64_t exp, tot_exp;
ssize_t s;
if (clock_gettime(CLOCK_REALTIME, &now) == -1)
handle_error("clock_gettime");
/* Create a CLOCK_REALTIME absolute timer with initial
expiration and interval as specified in command line */
new_value.it_value.tv_sec = now.tv_sec + 1;
new_value.it_value.tv_nsec = now.tv_nsec;
new_value.it_interval.tv_sec = 5;
new_value.it_interval.tv_nsec = 0;
max_exp = 5; //say 5 times
fd = timerfd_create(CLOCK_REALTIME, 0);
if (fd == -1)
handle_error("timerfd_create");
if (timerfd_settime(fd, TFD_TIMER_ABSTIME, &new_value, NULL) == -1)
handle_error("timerfd_settime");
printf("timer started\n");
for (tot_exp = 0; tot_exp < max_exp;) {
s = read(fd, &exp, sizeof(uint64_t));
if (s != sizeof(uint64_t))
handle_error("read");
tot_exp += exp;
printf("read: %llu; total=%llu\n",
(unsigned long long) exp,
(unsigned long long) tot_exp);
}
//Do something else ?
//while(1);
exit(EXIT_SUCCESS);
}
EDIT
I have one more question.
On changing these lines in above code from
new_value.it_interval.tv_sec = 5;
new_value.it_interval.tv_nsec = 0;
to
new_value.it_interval.tv_sec = 0;
new_value.it_interval.tv_nsec = 5000000000;
I see that there is no 5 seconds delay. Whats happening here?
You need to understand how to use multiplexing syscalls like poll(2) (or the older select(2) which tends to become obsolete) and use them to test the readability of the file descriptor obtained by timerfd_create(2) before read(2)-ing it.
However, be aware that timerfd_create works only when that read call succeeded. So only when the poll says you that the fd is not readable can you do something else. That something else should be quick (last less than 5 seconds).
You might want to investigate event loop libraries, like e.g. libevent (wrapping poll). If you are coding a graphical application (using Qt or Gtk) it does already have its own event loop. If clever enough, you could do your 5-second period without any timerfd_create, just thru your event loop (by carefully setting the timeout given to poll, etc.).
Addenda:
the tv_nsec field should always be non-negative and less than 1000000000 (the number of nanoseconds in a second).
Any reason you have to use timerfd? Just schedule an alarm and make a handler for SIGALRM to call your function.
If you don't want to use signals, just create an extra thread to block on your timer fd and proceed as normal in the main thread.
If you don't like either of those and you want to do work while you're waiting, you have to poll. You can do it as basile suggests, or you could just store the current time and check whenever you would poll to see if the desired period has elapsed.

Resources