How to update epoll events after epoll_wait? - c

I have the following code excerpt (heavily redacted to remove unimportant details) which fails under a rare and particular set of circumstances.
struct epoll_event *events = calloc(MAXEVENTS+1, sizeof(struct epoll_event));
struct sockaddr_in in_addr;
socklen_t in_len = sizeof in_addr;
int n = epoll_wait(efd,events, MAXEVENTS, -1);
for(int i=0; i<n; i++)
struct epoll_event *evI = &events[i];
uint64 u64 = evI->data.u64;
int type = u64>>32, fd=u64, fdx = fd;
if(type == -1)
while((fd = accept4(fdx, &in_addr, &in_len, SOCK_NONBLOCK|SOCK_CLOEXEC))>-1)
setNew_Connection(efd, fd);
storeAddrPort(fd, &in_addr, &in_len);
if(evI->events&EPOLLOUT) //process out data stuff
else if(evI->events&EPOLLIN) //process in data stuff and possibly close a different connection.
Listening sockets are differentiated by -1 in the upper part of evI->data.u64
setNew_Connection does the usual accepting stuff like adding the new socket to epoll etc
EPOLLET is used.
Now it all works brilliantly except under the following circumstances it fails because events is only updated in the epoll_wait so a connection closure does not affect the n events until after returning to the top of the while(1) loop.
epoll_wait unblocks with 3 events queued in the events struct table.
First event (n=0), is incoming data after which code decides to close a connection (e.g. file descriptor 8) as it is no longer needed.
2nd event (n=1) is an incoming new connection. accept4 assigns fd:8 as it has recently become available. setNew_Connection adds it to the epoll list.
3rd event is incoming data for the connection closed in step 2. i.e. fd:8 but it is no longer valid as the original fd:8 connection was closed and the current fd:8 is for a different connection.
I hope I have explained the problem adequately. The issue is that queued events in the events table are not updated when a connection is closed until the code returns to epoll_wait. How can I code around this problem?

Orel gave me the answer but I thought I would post the complete code solution. Instead of
I use
FDS[FDSL++] = fd;
The shutdown prevents anymore data being read or written but doesn't actually close the socket. FDS[FDSL++] = fd; stores the fd so that later after the n events are done, it can be closed with while(FDSL)close(FDS[--FDSL];
struct epoll_event *events = calloc(MAXEVENTS+1, sizeof(struct epoll_event));
struct sockaddr_in in_addr;
socklen_t in_len = sizeof in_addr;
int n = epoll_wait(efd,events, MAXEVENTS, -1);
for(int i=0; i<n; i++)
struct epoll_event *evI = &events[i];
uint64 u64 = evI->data.u64;
int type = u64>>32, fd=u64, fdx = fd;
if(type == -1)
while((fd = accept4(fdx, &in_addr, &in_len, SOCK_NONBLOCK|SOCK_CLOEXEC))>-1)
setNew_Connection(efd, fd);
storeAddrPort(fd, &in_addr, &in_len);
if(evI->events&EPOLLOUT) //process out data stuff
else if(evI->events&EPOLLIN) //process in data stuff and possibly close a different connection.


AF_XDP-Socket vs Linux Sockets: Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't?

I am comparing AF-XDP sockets vs Linux Sockets in terms of how many packets they can process without packet-loss (packet-loss is defined as the RTP-sequence number of the current packet is not equal to the RTP-sequence number of the previous packet + 1).
I noticed that my AF-XDP socket program (I can't determine if this problem is related to the kernel program or the user-space program) is losing around ~25 packets per second at around 390.000 packets per second whereas an equivalent program with generic linux sockets doesn't lose any packets.
I implemented a so-called distributor-program which loads the XDP-kernel program once, sets up a generic linux socket and adds setsockopt(IP_ADD_MEMBERSHIP) to this generic socket for every multicast-address I pass to the program via command line.
After this, the distributor loads the filedescriptor of a BPF_MAP_TYPE_HASH placed in the XDP-kernel program and inserts routes for the traffic in case a single AF-XDP socket needs to share its umem later on.
The XDP-kernel program then checks for each IPv4/UDP packet if there is an entry in that hash-map. This basically looks like this:
const struct pckt_idntfy_raw raw = {
.src_ip = 0, /* not used at the moment */
.dst_ip = iph->daddr,
.dst_port = udh->dest,
.pad = 0
const int *idx = bpf_map_lookup_elem(&xdp_packet_mapping, &raw);
if(idx != NULL) {
if (bpf_map_lookup_elem(&xsks_map, idx)) {
bpf_printk("Found socket # index: %d!\n", *idx);
return bpf_redirect_map(&xsks_map, *idx, 0);
} else {
bpf_printk("Didn't find connected socket for index %d!\n", *idx);
In case idx exists this means that there is a socket sitting behind that index in the BPF_MAP_TYPE_XSKMAP.
After doing all that the distributor spawns a new process via fork() passing all multicast-addresses (including destination port) which should be processed by that process (one process handles one RX-Queue). In case there are not enough RX-Queues, some processes may receive multiple multicast-addresses. This then means that they are going to use SHARED UMEM.
I basically oriented my AF-XDP user-space program on this example code:
I am using the same xsk_configure_umem, xsk_populate_fill_ring and xsk_configure_socket functions.
Because I figured I don't need maximum latency for this application, I send the process to sleep for a specified time (around 1 - 2ms) after which it loops through every AF-XDP socket (most of the time it is only one socket) and processes every received packet for that socket, verifying that no packets have been missed:
while(!global_exit) {
nanosleep(&spec, &remaining);
for(int i = 0; i < cfg.ip_addrs_len; i++) {
struct xsk_socket_info *socket = xsk_sockets[i];
if(atomic_exchange(&socket->stats_sync.lock, 1) == 0) {
atomic_fetch_xor(&socket->stats_sync.lock, 1); /* release socket-lock */
In my opinion there is nothing too fancy about this but somehow I lose ~25 packets at around 390.000 packets even though my UMEM is close to 1GB of RAM.
In comparison, my generic linux socket program looks like this (in short):
int fd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
/* setting some socket options */
struct sockaddr_in sin;
memset(&sin, 0, sizeof(struct sockaddr_in));
sin.sin_family = AF_INET;
sin.sin_port = cfg->ip_addrs[0]->pckt.dst_port;
inet_aton(cfg->ip_addrs[0]->pckt.dst_ip, &sin.sin_addr);
if(bind(fd, (struct sockaddr*)&sin, sizeof(struct sockaddr)) < 0) {
fprintf(stderr, "Error on binding socket: %s\n", strerror(errno));
return - 1;
ioctl(fd, SIOCGIFADDR, &intf);
The distributor-program creates a new process for every given multicast-ip in case generic linux sockets are used (because there are no sophisticated methods such as SHARED-UMEM in generic sockets I don't bother with multiple multicast-streams per process).
Later on I of course join the multicast membership:
struct ip_mreqn mreq;
memset(&mreq, 0, sizeof(struct ip_mreqn));
const char *multicast_ip = cfg->ip_addrs[0]->pckt.dst_ip;
if(inet_pton(AF_INET, multicast_ip, &mreq.imr_multiaddr.s_addr)) {
/* Local interface address */
memcpy(&mreq.imr_address, &cfg->ifaddr, sizeof(struct in_addr));
mreq.imr_ifindex = cfg->ifindex;
if(setsockopt(igmp_socket_fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(struct ip_mreqn)) < 0) {
fprintf(stderr, "Failed to set `IP_ADD_MEMBERSHIP`: %s\n", strerror(errno));
} else {
printf("Successfully added Membership for IP: %s\n", multicast_ip);
and start processing packets (not sleeping but in a busy-loop like fashion):
void read_packets_recvmsg_with_latency(struct config *cfg, struct statistic *st, void *buff, const int igmp_socket_fd) {
char ctrl[CMSG_SPACE(sizeof(struct timeval))];
struct msghdr msg;
struct iovec iov;
msg.msg_control = (char*)ctrl;
msg.msg_controllen = sizeof(ctrl);
msg.msg_name = &cfg->ifaddr;
msg.msg_namelen = sizeof(cfg->ifaddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
iov.iov_base = buff;
iov.iov_len = BUFFER_SIZE;
struct timeval time_user, time_kernel;
struct cmsghdr *cmsg = (struct cmsghdr*)&ctrl;
const int64_t read_bytes = recvmsg(igmp_socket_fd, &msg, 0);
if(read_bytes == -1) {
gettimeofday(&time_user, NULL);
if(cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_TIMESTAMP) {
memcpy(&time_kernel, CMSG_DATA(cmsg), sizeof(struct timeval));
if(verify_rtp(cfg, st, read_bytes, buff)) {
const double timediff = (time_user.tv_sec - time_kernel.tv_sec) * 1000000 + (time_user.tv_usec - time_kernel.tv_usec);
if(timediff > st->stats.latency_us) {
st->stats.latency_us = timediff;
int main(...) {
while(!is_global_exit) {
read_packets_recvmsg_with_latency(&cfg, &st, buffer, igmp_socket_fd);
That's pretty much it.
Please not that in the described use case where I start to lose packets I don't use SHARED UMEM, it's just a single RX-Queue receiving a multicast-stream. In case I process a smaller multicast-stream of around 150.000 pps - the AF-XDP solution doesn't lose any packets. But it is also the other way around - for around 520.000 pps on the same RX-Queue (using SHARED UMEM) I get a loss of 12.000 pps.
Any ideas what I am missing?

ePoll not accepting some clients

I have implemented a server using epoll, in what I believe to be the standard way, in fact, when I implemented it using the example from the epoll man page I got the same behavior.
This leads me to believe that there must be a problem with my client, and that I'm making assumptions somehow where I shouldn't. The main method of my client forks n number of clients, which then connect to the server. What I'm seeing is that usually a subset of these clients don't trigger the epoll, and never hit the
accept() call. The three-way-handshake completes because there is a listening socket, so the client behaves as if it where accepted, but it never gets served, as the server doesn't know about it. I can't figure out why this is happening, and haven't been able to find similar questions online. Thoughts?
Here's the relevant server code:
// wrapper which binds to port and exits on error
listenFD = tcpSocket(host, port);
// wrapper which listens and exits on error
epollFD = epoll_create(EPOLL_QUEUE_LEN);
if (epollFD == -1)
// Add the server socket to the epoll event loop = EPOLLIN | EPOLLERR | EPOLLHUP | EPOLLET; = listenFD;
if (epoll_ctl (epollFD, EPOLL_CTL_ADD, listenFD, &event) == -1)
//struct epoll_event events[MAX_EVENTS];
numFDs = epoll_wait(epollFD, events, EPOLL_QUEUE_LEN, -1);
for (i = 0; i < numFDs; i++){
// Case 1: Error condition
if (events[i].events & (EPOLLHUP | EPOLLERR)){
errMessage("epoll: EPOLLERR");
printf("Closed connection to %d\n", events[i].data.fd);
// Case 2: Server is receiving a connection request
if (events[i].data.fd == listenFD){
// socketlen_t
clientLen = sizeof(client);
newFD = Accept (listenFD, (SA *)&client, &clientLen);
// Set receive low water mark to message size
SetSockOpt(newFD, SOL_SOCKET, SO_RCVLOWAT, &lowWater, sizeof(lowWater));
// Add the new socket descriptor to the epoll loop = newFD; = EPOLLIN | EPOLLET;
if (epoll_ctl (epollFD, EPOLL_CTL_ADD, newFD, &event) == -1)
errSystem ("epoll_ctl");
printf("Connected to client on socket %d\n", newFD);
// tell the client we're connected and ready
// (this is an attempt to fix my issue. I'd rather not do this...)
Writen(newFD, buffer, CLIENT_MESSAGE_SIZE);
if (events[i].events & EPOLLIN){
//serve the client
And this is the client code. One instance of these works, but if I fork more than 5 or so (sometimes just 2) a number of them won't get accepted.
int Client(const char *host, const int port, const int timeLen, const int clientNum){
long double delta;
PTSTRUCTS ptstructs = (PTSTRUCTS) malloc(sizeof(TSTRUCTS));
size_t result;
stop = FALSE;
cNum = clientNum;
Signal(SIGINT, closeFD);
Signal(SIGALRM, sendMessage);
nsockets = 0;
// wrapper which calls connect() and exits with message on error
connectFD = tcpConnect(host, port);
printf("%d connected to server:\n", clientNum);
// initialize client message
strcpy(sbuf, CLIENT_MESSAGE);
// get the start time
while((delta = getTimeDelta(ptstructs)) < timeLen){
// One or more clients blocks here for ever
if ((result = receiveMessage()) < CLIENT_MESSAGE_SIZE){
stop = TRUE;
Close (connectFD);
delta = getTimeDelta(ptstructs);
printf("Client %d served %ld bytes in %.6Lf seconds.\n", clientNum, byteCount, delta);
// free heap memory
return (1);
I should probably note that I'm seeing the same behavior if I don't set EPOLLET. I originally thought this might be the result of edge-triggering behavior, but nope.
Does your backlog argument of listen() greater than the number of clients?
#some-programmer-dude, sorry for not commentting, the closed fd will be removed from epoll event set automatically

C - Using poll to multiplex between socket(s) and stdin - Server

I'm writing a client server application and I'm using poll to multiplex between several client sockets and stdin, where I can insert commands (example: stop the server). I believe the structure (the "logic") of my code is correct, however it's not behaving the way I expect it to:
struct pollfd pfd[NSERVER]; //defined as 10
pfd[0].fd = fileno(stdin);
pfd[0].events = POLLIN;
pfd[1].fd = socktfd; //server bind, listen socket
pfd[1].events = POLLIN;
struct sockaddr_storage remoteaddr; // client address
socklen_t addrlen;
char remoteIP[INET6_ADDRSTRLEN];
addrlen = sizeof remoteaddr;
char buf[1024]; // buffer
int pos=2;
while(poll(pfd,1,0) >= 0)
if(pfd[0].revents & POLLIN) { //stdin
//process input and perform command
if(pfd[1].revents & POLLIN) {
/* new connection */
int connsockfd = accept(socktfd, (struct sockaddr *)&remoteaddr,&addrlen);
int i=2;
//Loop through the fd in pfd for events
while (i<=NSERVER)
if (pfd[i].revents & POLLIN) {
int c=recv(pfd[i].fd, buf, sizeof buf, 0);
if(c<=0) {
if (c==0)
/* Client closed socket */
{//Client sent some data
if (c<=0)
I've removed some code inside the recv and send to make the code easier to read.
It fails to behave (it just hangs, doesn't accept connections or reacts to input from stdin).
Note: I would prefer to use poll over select, so please don't point to select :-).
Thanks in advance for any assistance.
you should set every pfd[i].fd = -1, so they get ignored initially by poll().
poll(pfd, 1, 0) is wrong and should at least be poll(pfd, 2, 0) or even poll(pfd, NSERVER, 0).
while(i<=NSERVER) should be while(i<NSERVER)
Your program probably hangs, because you loop through the pfd array, which is not initialized and containes random values for .fd and .revents, so it wants to send() or recv() on some random FD which might block. Do if(pdf[i].fd < 0) {i++; continue;} in the i<NSERVER loop.
You also don't set pfd[pos].events = POLLIN on newly accepted sockets. Don't set POLLOUT unless you have something to send, because it will trigger almost every time.

Boss Worker Pthreads Web Server in C - Server crashes if more requests sent than number of threads

I'm writing a web server in C (which I suck with) using Pthreads (which I suck with even more) and I'm stuck at this point. The model for the server is boss-worker so the boss thread instantiates all worker threads at the beginning of the program. There is a global queue that stores the socket of the incoming connection(s). The boss thread is the one that adds all items (sockets) to the queue as the connections are accepted. All of the worker threads then wait for an item to be added to a global queue in order for them to take up the processing.
The server works fine as long as I connect to it less times than the number of worker threads that the server has. Because of that, I think that either something is wrong with my mutexes (maybe the signals are getting lost?) or the threads are being disabled after they run once (which would explain why if there are 8 threads, it can only parse the first 8 http requests).
Here is my global queue variable.
int queue[QUEUE_SIZE];
This is the main thread. It creates a queue struct (defined elsewhere) with methods enqueue, dequeue, empty, etc. When the server accepts a connection, it enqueues the socket that the incoming connection is on. The worker threads which were dispatched at the beginning are constantly checking this queue to see if any jobs have been added, and if there are jobs, then they dequeue the socket, connect to that port, and read/parse/write the incoming http request.
int main(int argc, char* argv[])
int hSocket, hServerSocket; /* handle to socket */
struct hostent* pHostInfo; /* holds info about a machine */
struct sockaddr_in Address; /* Internet socket address stuct */
int nAddressSize = sizeof(struct sockaddr_in);
int nHostPort;
int numThreads;
int i;
if(argc < 3) {
printf("\nserver-usage port-num num-thread\n");
return 0;
else {
printf("\nStarting server");
printf("\nMaking socket");
/* make a socket */
if(hServerSocket == SOCKET_ERROR)
printf("\nCould not make a socket\n");
return 0;
/* fill address struct */
Address.sin_addr.s_addr = INADDR_ANY;
Address.sin_port = htons(nHostPort);
Address.sin_family = AF_INET;
printf("\nBinding to port %d\n",nHostPort);
/* bind to a port */
if(bind(hServerSocket,(struct sockaddr*)&Address,sizeof(Address)) == SOCKET_ERROR) {
printf("\nCould not connect to host\n");
return 0;
/* get port number */
getsockname(hServerSocket, (struct sockaddr *) &Address,(socklen_t *)&nAddressSize);
printf("Opened socket as fd (%d) on port (%d) for stream i/o\n",hServerSocket, ntohs(Address.sin_port));
sin_family = %d\n\
sin_addr.s_addr = %d\n\
sin_port = %d\n"
, Address.sin_family
, Address.sin_addr.s_addr
, ntohs(Address.sin_port)
//Up to this point is boring server set up stuff. I need help below this.
//instantiate all threads
pthread_t tid[numThreads];
for(i = 0; i < numThreads; i++) {
printf("\nMaking a listen queue of %d elements",QUEUE_SIZE);
/* establish listen queue */
if(listen(hServerSocket,QUEUE_SIZE) == SOCKET_ERROR) {
printf("\nCould not listen\n");
return 0;
while(1) {
printf("\nWaiting for a connection");
while(!empty(head,tail)) {
pthread_cond_wait (&cond2, &mtx);
/* get the connected socket */
hSocket = accept(hServerSocket,(struct sockaddr*)&Address,(socklen_t *)&nAddressSize);
printf("\nGot a connection");
pthread_cond_signal(&cond); // wake worker thread
Here is the worker thread. This should be always running checking for new requests (by seeing if the queue is not empty). At the end of this method, it should be deferring back to the boss thread to wait for the next time it is needed.
void *worker(void *threadarg) {
while(empty(head,tail)) {
pthread_cond_wait(&cond, &mtx);
int hSocket = dequeue(queue,&head);
unsigned nSendAmount, nRecvAmount;
char line[BUFFER_SIZE];
nRecvAmount = read(hSocket,line,sizeof line);
printf("\nReceived %s from client\n",line);
//DO ALL HTTP PARSING (Removed for the sake of space; I can add it back if needed)
nSendAmount = write(hSocket,allText,sizeof(allText));
if(nSendAmount != -1) {
totalBytesSent = totalBytesSent + nSendAmount;
printf("\nSending result: \"%s\" back to client\n",allText);
printf("\nClosing the socket");
/* close socket */
if(close(hSocket) == SOCKET_ERROR) {
printf("\nCould not close socket\n");
return 0;
Any help would be greatly appreciated. I can post more of the code if anyone needs it, just let me know. I'm not the best with OS stuff, especially in C, but I know the basics of mutexes, cond. variables, semaphores, etc. Like I said, I'll take all the help I can get. (Also, I'm not sure if I posted the code exactly right since this is my first question. Let me know if I should change the formatting at all to make it more readable.)
Time for a workers' revolution.
The work threads seem to be missing a while(true) loop. After the HTTP exchange and closing the socket, they should be looping back to wait on the queue for more sockets/requests.

How to interrupt epoll_pwait with an appropriate signal? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Interrupting epoll_wait with a non-IO event, no signals
I have a thread that is currently using epoll_wait to flag the arrival of data on some sockets. The timeout parameter is currently set to zero.
However, the thread also does other tasks. What I want to do is change this so that if there is no work to be done then make it an indefinite or long time out. This will dramatically reduce wasted CPU cycles spinning when there is no actual work to do.
The whole thing is driven mostly by the arrival of a message on a thread safe lock free queue.
So, what I think should happen is I should wake up the thread from it's long timeout using epoll_pwait.
However, I'm unsure what signal to send it and how this is done. I'm not familiar with Linux signals.
The following is similar to what I currently have. Dramatically shorted to show the concept. If you spot a bug, don't bother pointing it out, this is just an illustration that I've typed in here to help you understand what I'm wanting to achieve.
// Called from another thread...
void add_message_to_queue(struct message_t* msg)
raise( ? ); // wake the state machine?
// different thread to the above.
struct message_t msg;
while (msg = get_message_from_queue())
timeout = work_available ? 0 : -1;
nfds = epoll_pwait(epfd, events, MAX_EPOLL_EVENTS, timeout);
for (i = 0; i < nfds; ++i)
if ((events[i].events & EPOLLIN) == EPOLLIN)
/// do stuff
So I guess my question is really, is this the right way of going about it? and if so, what signal do I send and do I need to define a signal handler or can I use the signal disposition "ignore" and still be woken?
Instead of signals, consider using a pipe. Create a pipe and add the file descriptor for the read end of the pipe to the epoll. When you want to wake the epoll_wait call, just write 1 character to the write end of the pipe.
int read_pipe;
int write_pipe;
void InitPipe()
int pipefds[2] = {};
epoll_event ev = {};
pipe(pipefds, 0);
read_pipe = pipefds[0];
write_pipe = pipefds[1];
// make read-end non-blocking
int flags = fcntl(read_pipe, F_GETFL, 0);
fcntl(write_pipe, F_SETFL, flags|O_NONBLOCK);
// add the read end to the epoll = EPOLLIN; = read_pipe;
epoll_ctl(epfd, EPOLL_CTL_ADD, read_pipe, &ev);
void add_message_to_queue(struct message_t* msg)
char ch = 'x';
write(write_pipe, &ch, 1);
struct message_t msg;
while (msg = get_message_from_queue())
timeout = work_available ? 0 : -1;
nfds = epoll_wait(epfd, events, MAX_EPOLL_EVENTS, timeout);
for (i = 0; i < nfds; ++i)
if (events[i].data.fd == read_pipe)
// read all bytes from read end of pipe
char ch;
int result = 1;
while (result > 0)
result = read(epoll_read, &ch, 1);
if ((events[i].events & EPOLLIN) == EPOLLIN)
/// do stuff
