How to interrupt epoll_pwait with an appropriate signal? [duplicate] - c

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Interrupting epoll_wait with a non-IO event, no signals
I have a thread that is currently using epoll_wait to flag the arrival of data on some sockets. The timeout parameter is currently set to zero.
However, the thread also does other tasks. What I want to do is change this so that if there is no work to be done then make it an indefinite or long time out. This will dramatically reduce wasted CPU cycles spinning when there is no actual work to do.
The whole thing is driven mostly by the arrival of a message on a thread safe lock free queue.
So, what I think should happen is I should wake up the thread from it's long timeout using epoll_pwait.
However, I'm unsure what signal to send it and how this is done. I'm not familiar with Linux signals.
The following is similar to what I currently have. Dramatically shorted to show the concept. If you spot a bug, don't bother pointing it out, this is just an illustration that I've typed in here to help you understand what I'm wanting to achieve.
// Called from another thread...
void add_message_to_queue(struct message_t* msg)
{
add_msg(msg);
raise( ? ); // wake the state machine?
}
// different thread to the above.
main_thread()
{
struct message_t msg;
while (msg = get_message_from_queue())
process_message(msg);
timeout = work_available ? 0 : -1;
nfds = epoll_pwait(epfd, events, MAX_EPOLL_EVENTS, timeout);
for (i = 0; i < nfds; ++i)
{
if ((events[i].events & EPOLLIN) == EPOLLIN)
{
/// do stuff
}
}
run_state_machines();
}
So I guess my question is really, is this the right way of going about it? and if so, what signal do I send and do I need to define a signal handler or can I use the signal disposition "ignore" and still be woken?

Instead of signals, consider using a pipe. Create a pipe and add the file descriptor for the read end of the pipe to the epoll. When you want to wake the epoll_wait call, just write 1 character to the write end of the pipe.
int read_pipe;
int write_pipe;
void InitPipe()
{
int pipefds[2] = {};
epoll_event ev = {};
pipe(pipefds, 0);
read_pipe = pipefds[0];
write_pipe = pipefds[1];
// make read-end non-blocking
int flags = fcntl(read_pipe, F_GETFL, 0);
fcntl(write_pipe, F_SETFL, flags|O_NONBLOCK);
// add the read end to the epoll
ev.events = EPOLLIN;
ev.data.fd = read_pipe;
epoll_ctl(epfd, EPOLL_CTL_ADD, read_pipe, &ev);
}
void add_message_to_queue(struct message_t* msg)
{
char ch = 'x';
add_msg(msg);
write(write_pipe, &ch, 1);
}
main_thread()
{
struct message_t msg;
while (msg = get_message_from_queue())
process_message(msg);
timeout = work_available ? 0 : -1;
nfds = epoll_wait(epfd, events, MAX_EPOLL_EVENTS, timeout);
for (i = 0; i < nfds; ++i)
{
if (events[i].data.fd == read_pipe)
{
// read all bytes from read end of pipe
char ch;
int result = 1;
while (result > 0)
{
result = read(epoll_read, &ch, 1);
}
}
if ((events[i].events & EPOLLIN) == EPOLLIN)
{
/// do stuff
}
}
run_state_machines();
}

Related

Why does timerfd_settime fail with EBADF when called with valid fd from timerfd_create?

I have a function that creates a timerfd timer, but sometimes the timerfd_settime returns with EBADF (Bad file descriptor). I cannot fathom a scenario where timerfd_create returns a valid file descriptor, which then fails when immediately called with timerfd_settime.
I use this function with an epoll event loop, and sometimes, this function will return a valid fd, only to have epoll_ctl fail with EBADF when adding the timer fd. I assume that if I understand why timerfd_settime sometimes fail, it will illuminate the epoll fail as well.
static inline int create_timer(uint32_t interval_ms, uint32_t start_ms)
{
struct itimerspec its = {{0}};
int fd = timerfd_create(CLOCK_MONOTONIC, 0);
if (fd < 0) {
perror("timerfd_create");
return -1;
}
its.it_interval = timespec_ns((int64_t)(interval_ms) * NSEC_PER_MSEC);
if (start_ms)
its.it_value = timespec_ns((int64_t)(start_ms) * NSEC_PER_MSEC);
else
its.it_value.tv_nsec = 1;
if (timerfd_settime(fd, 0, &its, NULL) < 0) {
perror("timerfd_settime");
close(fd);
return -1;
}
return fd;
}
It is used in a multi threaded "curl_multi_socket" application. There are multiple worker threads, that each needs to download many files and parse them, often. Each thread has its own epoll-loop. Inter-thread communication is handled through the use of unix sockets.
The function is used to set the timeouts for the CURLs timerfunc callback:
static int fetch_timerfunc(CURLM *curlm, long timeout_ms, void *ctx)
{
struct fetch *cm = (struct fetch*) ctx;
struct epoll_event ev = {};
// Cancel previous timeout, if any
if (cm->timer_fd > 0) {
close(cm->timer_fd);
cm->timer_fd = 0;
}
if (timeout_ms < 0) {
return 0;
}
cm->timer_fd = create_timer(0, timeout_ms);
if (cm->timer_fd < 0) {
perror("fetch_timerfunc: create_timer");
return cm->timer_fd;
}
ev.events = EPOLLIN;
ev.data.fd = CURL_SOCKET_TIMEOUT;
if (epoll_ctl(cm->epoll_fd, EPOLL_CTL_ADD, cm->timer_fd, &ev) < 0) {
if (!exiting) {
perror("fetch_timerfunc: epoll_ctl");
}
}
return 0;
}
#KamilCuk is right. A bug in another place caused one thread to sometimes close fd 0 many times in quick succession. This meant that timerfd_create sometimes returned 0 as its assigned fd, and that got immediately closed in the buggy thread before the call to timerfd_settime.

Signal handler triggers on first time signal is raised, but not subsequent times

I'm fairly new to signals. It's an AF_INET socket, handling SIGIO. The first connection and packet are received and handled (displayed) properly, as far as I can tell.
Subsequent sends are not, however. A breakpoint on the signal handler in GDB (as well as the atomic state flag) shows that the signal handler is never raised again when packets are sent.
Relevant portions of my code (sections missing):
char buffer[1024];
volatile sig_atomic_t data_waiting = 0;
struct sigaction saio;
// [...] Socket setup and accept
saio.sa_handler = signal_handler_IO;
saio.sa_flags = 0;
saio.sa_restorer = NULL;
sigemptyset(&saio.sa_mask);
sigaddset(&saio.sa_mask, SIGINT);
sigaction(SIGIO, &saio, NULL);
fcntl(connected_sockfd, F_SETOWN, getpid());
fcntl(connected_sockfd, F_SETFL, FASYNC);
while(run_main)
{
usleep(500);
if(data_waiting == 1)
{
data_waiting = 0;
bytes = read(connected_sockfd, buffer, 1023);
if(bytes > 0)
{
buffer[bytes] = 0; // null termination
printf("Message: %s\n", buffer);
buffer[0] = 0; // "reset" string
}
}
}
and my handler:
void signal_handler_IO(int status)
{
data_waiting = 1;
}
I have nearly the same code handling SIGIO on a serial port file descriptor with zero issues, so I'm really stumped. I assumed the handler would be installed identically, but I might be wrong there.

Waiting for child processes when using select() for multiplexing

I am facing some trouble dealing with zombie processes. I wrote a simple server which creates tic tac toe matches between players. I am using select() to multiplex between multiple connected clients. Whenever there are two clients, the server will fork another process which execs a match arbiter program.
The problem is that select() blocks. So therefore, say if there is a match arbiter program running as a child process and it exits, the parent will never wait for the child if there are no incoming connections because select() is blocking.
I have my code here, apologies since it is quite messy.
while(1) {
if (terminate)
terminate_program();
FD_ZERO(&rset);
FD_SET(tcp_listenfd, &rset);
FD_SET(udpfd, &rset);
maxfd = max(tcp_listenfd, udpfd);
/* add child connections to set */
for (i = 0; i < MAXCLIENTS; i++) {
sd = tcp_confd_lst[i];
if (sd > 0)
FD_SET(sd, &rset);
if (sd > maxfd)
maxfd = sd;
}
/* Here select blocks */
if ((nready = select(maxfd + 1, &rset, NULL, NULL, NULL)) < 0) {
if (errno == EINTR)
continue;
else
perror("select error");
}
/* Handles incoming TCP connections */
if (FD_ISSET(tcp_listenfd, &rset)) {
len = sizeof(cliaddr);
if ((new_confd = accept(tcp_listenfd, (struct sockaddr *) &cliaddr, &len)) < 0) {
perror("accept");
exit(1);
}
/* Send connection message asking for handle */
writen(new_confd, handle_msg, strlen(handle_msg));
/* adds new_confd to array of connected fd's */
for (i = 0; i < MAXCLIENTS; i++) {
if (tcp_confd_lst[i] == 0) {
tcp_confd_lst[i] = new_confd;
break;
}
}
}
/* Handles incoming UDP connections */
if (FD_ISSET(udpfd, &rset)) {
}
/* Handles receiving client handles */
/* If client disconnects without entering their handle, their values in the arrays will be set to 0 and can be reused. */
for (i = 0; i < MAXCLIENTS; i++) {
sd = tcp_confd_lst[i];
if (FD_ISSET(sd, &rset)) {
if ((valread = read(sd, confd_handle, MAXHANDLESZ)) == 0) {
printf("Someone disconnected: %s\n", usr_handles[i]);
close(sd);
tcp_confd_lst[i] = 0;
usr_in_game[i] = 0;
} else {
confd_handle[valread] = '\0';
printf("%s\n", confd_handle); /* For testing */
fflush(stdout);
strncpy(usr_handles[i], confd_handle, sizeof(usr_handles[i]));
for (j = i - 1; j >= 0; j--) {
if (tcp_confd_lst[j] != 0 && usr_in_game[j] == 0) {
usr_in_game[i] = 1; usr_in_game[j] = 1;
if ((child_pid = fork()) == 0) {
close(tcp_listenfd);
snprintf(fd_args[0], sizeof(fd_args[0]), "%d", tcp_confd_lst[i]);
snprintf(fd_args[1], sizeof(fd_args[1]), "%d", tcp_confd_lst[j]);
execl("nim_match_server", "nim_match_server", usr_handles[i], fd_args[0], usr_handles[j], fd_args[1], (char *) 0);
}
close(tcp_confd_lst[i]); close(tcp_confd_lst[j]);
tcp_confd_lst[i] = 0; tcp_confd_lst[j] = 0;
usr_in_game[i] = 0; usr_in_game[j] = 0;
}
}
}
}
}
}
Is there a method which allows wait to run even when select() is blocking? Preferably without signal handling since they are asynchronous.
EDIT: Actually, I found out that select has a timeval data structure which we can specify the timeout. Would using that be a good idea?
I think your options are:
Save all your child descriptors in a global array and call wait() from a signal handler. If you don't need the exit status of your children in your main loop, I think this is the easiest.
Instead of select, use pselect -- it will return upon receiving a specified (set of) signal(s), in your case, SIGCHLD. Then call wait/WNOHANG on all child PIDs. You will need to block/unblock SIGCHLD at the right moments before/after pselect(), see here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html
Wait on/cleanup child PIDs from a secondary thread. I think this is the most complicated solution (re. synchronization between threads), but since you asked, it's technically possible.
If you just want to prevent zombie processes, you could set up a SIGCHLD signal handler. If you want to actually wait for the return status, you could write bytes into a pipe (non-blocking, just in case) from the signal handler and then read those bytes in the select loop.
For how to handle SIGCHLD, see http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html -- you want to do something like while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {}
Perhaps the best approach is sending a single byte from the SIGCHLD signal handler to the main select loop (non-blocking, just in case) and doing the waitpid loop in the select loop when bytes can be read from the pipe.
You could also use a signalfd file descriptor to read the SIGCHLD signal, although that works only on Linux.

Socket performance

I just wondered about how Instant Messengers and Online Games can accept and deliver messages so fast. (Network programming with sockets)
I read about that this is done with nonblocking sockets.
I tried blocking sockets with pthreads (each client gets its own thread) and nonblocking sockets with kqueue.Then I profiled both servers with a program which made 99 connections (each connection in one thread) and then writes some garbage to it (with a sleep of 1 second). When all threads are set up, I measured in the main thread how long it took to get a connection from the server (with wall clock time) (while "99 users" are writing to it).
threads (avg): 0.000350 // only small difference to kqueue
kqueue (avg): 0.000300 // and this is not even stable (client side)
The problem is, while testing with kqueue I got multiple times a SIGPIPE error (client-side). (With a little timeout usleep(50) this error was fixed). I think this is really bad because a server should be capable to handle thousands of connections. (Or is it my fault on the client side?) The crazy thing about this is the infamous pthread approach did just fine (with and without timeout).
So my question is: how can you build a stable socket server in C which can handle thousands of clients "asynchronously"? I only see the threads approach as a good thing, but this is considered bad practice.
Greetings
EDIT:
My test code:
double get_wall_time(){
struct timeval time;
if (gettimeofday(&time,NULL)){
// Handle error
return 0;
}
return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
#define NTHREADS 100
volatile unsigned n_threads = 0;
volatile unsigned n_writes = 0;
pthread_mutex_t main_ready;
pthread_mutex_t stop_mtx;
volatile bool running = true;
void stop(void)
{
pthread_mutex_lock(&stop_mtx);
running = false;
pthread_mutex_unlock(&stop_mtx);
}
bool shouldRun(void)
{
bool copy;
pthread_mutex_lock(&stop_mtx);
copy = running;
pthread_mutex_unlock(&stop_mtx);
return copy;
}
#define TARGET_HOST "localhost"
#define TARGET_PORT "1336"
void *thread(void *args)
{
char tmp = 0x01;
if (__sync_add_and_fetch(&n_threads, 1) == NTHREADS) {
pthread_mutex_unlock(&main_ready);
fprintf(stderr, "All %u Threads are ready...\n", (unsigned)n_threads);
}
int fd = socket(res->ai_family, SOCK_STREAM, res->ai_protocol);
if (connect(fd, res->ai_addr, res->ai_addrlen) != 0) {
socket_close(fd);
fd = -1;
}
if (fd <= 0) {
fprintf(stderr, "socket_create failed\n");
}
if (write(fd, &tmp, 1) <= 0) {
fprintf(stderr, "pre-write failed\n");
}
do {
/* Write some garbage */
if (write(fd, &tmp, 1) <= 0) {
fprintf(stderr, "in-write failed\n");
break;
}
__sync_add_and_fetch(&n_writes, 1);
/* Wait some time */
usleep(500);
} while (shouldRun());
socket_close(fd);
return NULL;
}
int main(int argc, const char * argv[])
{
pthread_t threads[NTHREADS];
pthread_mutex_init(&main_ready, NULL);
pthread_mutex_lock(&main_ready);
pthread_mutex_init(&stop_mtx, NULL);
bzero((char *)&hint, sizeof(hint));
hint.ai_socktype = SOCK_STREAM;
hint.ai_family = AF_INET;
if (getaddrinfo(TARGET_HOST, TARGET_PORT, &hint, &res) != 0) {
return -1;
}
for (int i = 0; i < NTHREADS; ++i) {
pthread_create(&threads[i], NULL, thread, NULL);
}
/* wait for all threads to be set up */
pthread_mutex_lock(&main_ready);
fprintf(stderr, "Main thread is ready...\n");
{
double start, end;
int fd;
start = get_wall_time();
fd = socket(res->ai_family, SOCK_STREAM, res->ai_protocol);
if (connect(fd, res->ai_addr, res->ai_addrlen) != 0) {
socket_close(fd);
fd = -1;
}
end = get_wall_time();
if (fd > 0) {
fprintf(stderr, "Took %f ms\n", (end - start) * 1000);
socket_close(fd);
}
}
/* Stop all running threads */
stop();
/* Waiting for termination */
for (int i = 0; i < NTHREADS; ++i) {
pthread_join(threads[i], NULL);
}
fprintf(stderr, "Performed %u successfull writes\n", (unsigned)n_writes);
/* Lol.. */
freeaddrinfo(res);
return 0;
}
SIGPIPE comes when I try to connect to the kqueue server (after 10 connections are made, the server is "stuck"?). And when too many users are writing stuff, the server cannot open a new connection. (kqueue server code from http://eradman.com/posts/kqueue-tcp.html)
SIGPIPE means you're trying to write to a socket (or pipe) where the other end has already been closed (so noone will be able to read it). If you don't care about that, you can ignore SIGPIPE signals (call signal(SIGPIPE, SIG_IGN)) and the signals won't be a problem. Of course the write (or send) calls on the sockets will still be failing (with EPIPE), so you need to make you code robust enough to deal with that.
The reason that SIGPIPE normally kills the process is that its too easy to write programs that ignore errors on write/send calls and run amok using up 100% of CPU time otherwise. As long as you carefully always check for errors and deal with them, you can safely ignore SIGPIPEs
Or is it my fault?
It was your fault. TCP works. Most probably you didn't read all the data that was sent.
And when too many users are writing stuff, the server cannot open a new connection
Servers don't open connections. Clients open connections. Servers accept connections. If your server stops doing that, there something wrong with your accept loop. It should only do two things: accept a connection, and start a thread.

Delayed execution using epoll main loop

How would I create a delayed execution of code or timeout events using epoll? Both libevent and libev has the functionality but I can't figure out how to do this using epoll.
Currently the main loop looks like this:
epoll_ctl(epfd, EPOLL_CTL_ADD, client_sock_fd, &epev);
while(1) {
int nfds = epoll_wait(epfd, &epev, 1, 10);
if (nfds < 0) exit(EXIT_FAILURE);
if (nfds > 0) {
// If an event has been recieved
}
// Do this every 10ms
}
I am well aware that this functionality could be achieved by simply adding how much time has passed but using epoll seems like a cleaner solution.
You can create timerfd and add the file descriptor to the epoll_wait
Stupid question: why not just keep track of the time explicitly? I do this in a multi-TCP client (for sending heartbeats) and the loop essentially does:
uint64_t last = get_time_in_usec();
uint64_t event_interval = 10 * 1000;
while(1) {
int nfds = epoll_wait(epfd, &epev, 1, 0); /* note that i set timeout = 0 */
if (nfds <= 0) { /* do some cleanup logic, handle EAGAIN */
if (nfds > 0) { /* If an event has been received */ }
if(get_time_in_usec() >= last + event_interval) { ... }
}
get_time_in_usec can be implemented using gettimeofday or rdtsc in linux. YMMV

Resources