How do I use blocking-status of socket as a condition? - c

I am maintaining/developing an existing code-base. We have a Raspberry Pi controlling a bunch of hardware, some of which is modular. The code, written in C (might be C++), communicates using IPv4 standards over a socket (using socket.h) with a gui on Windows. I'm pretty sure we don't have multithreading implemented in the raspberry code, or this would be much easier.
Some of the modular hardware is not interacting with the code. Without the extra hardware, the raspberry sits there waiting until the GUI sends something. That's how we want it behaving.
Problem is, when we have this extra hardware connected, the raspberry needs to also run some code in response to one of its IO pins tied to a button on the hardware.
I tried adding (various) conditions to a loop in main(), which is where I thought the code was idling. I always got either the hardware OR GUI control working, but not both.
I eventually figured out that read(), which is called near the end of that loop, is blocking.
Now I'm trying to figure out how to execute one chunk of code (for the hardware control) if read() is blocking, and another when it stops.
Something like:
(Pseudocode)
read socket
while(blocking)
{
check for hardware signal, if found, runTest();
}
{use result of read()}
The loop in main:
while(true)
{
initPort();
while(listening)
{
openPort();
if (netOpen)
{
//puts("// Tester is connected /////////////////////////////////");
loadCalFactors();
loadCalResistors();
loadExtConf(true);
inOn(false);
outOn(false);
// Loop until client terminates connection.
initPins();
while(netOpen)
{
/*
* Moving the block to work().
*/
// oldI = i;
// i = digitalRead(iTSW);
//
// if((i ^ oldI) == 0 && oldI == 1) // TSW has been held on
// testing = true;
//
// else if((i ^ oldI) == 1 && oldI == 0)
// testing = false;
//
// if(testLoaded && usingTermCtrl && i && !testing)
// {
// RunTp(false);
// }
// else
work();
}
}
}
}
Here's the code where it's blocking:
void work()
{
char bs[1];
int n;
//socket Sockfd is blocking here, so we only briefly return to the
//loop in main right after an action
n = read(Sockfd, bs, 1);
if(n > 0)
doAct(bs[0]);
else
checkForSOT();
}
I tried putting all my hardware checks in checkForSOT() (which was previously empty) already, but it didn't do any better than the loop in main.
And where the socket is being setup:
void initPort()
{
struct sockaddr_in serv_addr;
printf("Initializing Port %d\n", HOST_PORT);
listening = false;
sockListen = socket(AF_INET, SOCK_STREAM, 0);
printf("sockListen=%d\n",sockListen);
if(sockListen < 0)
{
puts("Error opening socket");
delay(1000);
}
else
{
bzero((char *)&serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = INADDR_ANY;
serv_addr.sin_port = htons(HOST_PORT);
if(bind(sockListen, (struct sockaddr *) &serv_addr,
sizeof(serv_addr)) < 0)
{
puts("Error on binding");
delay(1000);
}
else
{
listen(sockListen, 1);
listening = true;
}
}
}
void openPort()
{
int clilen;
int newsockfd;
printf("\nTester waiting for connection on socket %d\n", sockListen);
clilen = sizeof(cli_addr);
// infinite wait on a connection
if((newsockfd = accept(sockListen,
(struct sockaddr *) &cli_addr, (socklen_t*) &clilen) ) < 0 )
puts("Error on accept");
else
{
if (DEBUG) debugMode = true;
netOpen = true;
Sockfd = newsockfd;
sendVersion();
netWrite("Connected\n");
sendExtConfMessage();
sendExtConf();
}
}
Most of the posts I found on sockets blocking mention poll() and select() (along with their p- and e- upgrades(?)), but don't go into clear-enough detail on how they work for me to figure out if they are what I need.
I'm also not sure what it would take to change the socket to non-blocking while maintaining the same behavior.
Note: I am still trying to read through and wrap my head around Beej's guide to network programming, so if there's a specific section of that that'll help me, please be specific.
Also, if anyone knows of (or could write) a good guide to setting up remote debugging the raspberry through NetBeans 12 (on Windows 10), that would be a HUGE help!

Poll isn't very hard. Make a loop that runs at least every 5 msec.
while (running)
Initialize, then wait for something to happen
struct pollfd fds = {fd, POLLIN, 0};
int rc = poll(&fds, 1, 5); // wait for fds or 5 msec
if (rc < 0) {
perror("poll");
exit(1);
} else if (rc > 0) {
if (fds.revents & POLLIN)
recv(fd); // fds is ready
}
// check button here
}
Obviously there should be more checking and cleanup on error conditions.

Related

AF_XDP-Socket vs Linux Sockets: Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't?

I am comparing AF-XDP sockets vs Linux Sockets in terms of how many packets they can process without packet-loss (packet-loss is defined as the RTP-sequence number of the current packet is not equal to the RTP-sequence number of the previous packet + 1).
I noticed that my AF-XDP socket program (I can't determine if this problem is related to the kernel program or the user-space program) is losing around ~25 packets per second at around 390.000 packets per second whereas an equivalent program with generic linux sockets doesn't lose any packets.
I implemented a so-called distributor-program which loads the XDP-kernel program once, sets up a generic linux socket and adds setsockopt(IP_ADD_MEMBERSHIP) to this generic socket for every multicast-address I pass to the program via command line.
After this, the distributor loads the filedescriptor of a BPF_MAP_TYPE_HASH placed in the XDP-kernel program and inserts routes for the traffic in case a single AF-XDP socket needs to share its umem later on.
The XDP-kernel program then checks for each IPv4/UDP packet if there is an entry in that hash-map. This basically looks like this:
const struct pckt_idntfy_raw raw = {
.src_ip = 0, /* not used at the moment */
.dst_ip = iph->daddr,
.dst_port = udh->dest,
.pad = 0
};
const int *idx = bpf_map_lookup_elem(&xdp_packet_mapping, &raw);
if(idx != NULL) {
if (bpf_map_lookup_elem(&xsks_map, idx)) {
bpf_printk("Found socket # index: %d!\n", *idx);
return bpf_redirect_map(&xsks_map, *idx, 0);
} else {
bpf_printk("Didn't find connected socket for index %d!\n", *idx);
}
}
In case idx exists this means that there is a socket sitting behind that index in the BPF_MAP_TYPE_XSKMAP.
After doing all that the distributor spawns a new process via fork() passing all multicast-addresses (including destination port) which should be processed by that process (one process handles one RX-Queue). In case there are not enough RX-Queues, some processes may receive multiple multicast-addresses. This then means that they are going to use SHARED UMEM.
I basically oriented my AF-XDP user-space program on this example code: https://github.com/torvalds/linux/blob/master/samples/bpf/xdpsock_user.c
I am using the same xsk_configure_umem, xsk_populate_fill_ring and xsk_configure_socket functions.
Because I figured I don't need maximum latency for this application, I send the process to sleep for a specified time (around 1 - 2ms) after which it loops through every AF-XDP socket (most of the time it is only one socket) and processes every received packet for that socket, verifying that no packets have been missed:
while(!global_exit) {
nanosleep(&spec, &remaining);
for(int i = 0; i < cfg.ip_addrs_len; i++) {
struct xsk_socket_info *socket = xsk_sockets[i];
if(atomic_exchange(&socket->stats_sync.lock, 1) == 0) {
handle_receive_packets(socket);
atomic_fetch_xor(&socket->stats_sync.lock, 1); /* release socket-lock */
}
}
}
In my opinion there is nothing too fancy about this but somehow I lose ~25 packets at around 390.000 packets even though my UMEM is close to 1GB of RAM.
In comparison, my generic linux socket program looks like this (in short):
int fd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
/* setting some socket options */
struct sockaddr_in sin;
memset(&sin, 0, sizeof(struct sockaddr_in));
sin.sin_family = AF_INET;
sin.sin_port = cfg->ip_addrs[0]->pckt.dst_port;
inet_aton(cfg->ip_addrs[0]->pckt.dst_ip, &sin.sin_addr);
if(bind(fd, (struct sockaddr*)&sin, sizeof(struct sockaddr)) < 0) {
fprintf(stderr, "Error on binding socket: %s\n", strerror(errno));
return - 1;
}
ioctl(fd, SIOCGIFADDR, &intf);
The distributor-program creates a new process for every given multicast-ip in case generic linux sockets are used (because there are no sophisticated methods such as SHARED-UMEM in generic sockets I don't bother with multiple multicast-streams per process).
Later on I of course join the multicast membership:
struct ip_mreqn mreq;
memset(&mreq, 0, sizeof(struct ip_mreqn));
const char *multicast_ip = cfg->ip_addrs[0]->pckt.dst_ip;
if(inet_pton(AF_INET, multicast_ip, &mreq.imr_multiaddr.s_addr)) {
/* Local interface address */
memcpy(&mreq.imr_address, &cfg->ifaddr, sizeof(struct in_addr));
mreq.imr_ifindex = cfg->ifindex;
if(setsockopt(igmp_socket_fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(struct ip_mreqn)) < 0) {
fprintf(stderr, "Failed to set `IP_ADD_MEMBERSHIP`: %s\n", strerror(errno));
return;
} else {
printf("Successfully added Membership for IP: %s\n", multicast_ip);
}
}
and start processing packets (not sleeping but in a busy-loop like fashion):
void read_packets_recvmsg_with_latency(struct config *cfg, struct statistic *st, void *buff, const int igmp_socket_fd) {
char ctrl[CMSG_SPACE(sizeof(struct timeval))];
struct msghdr msg;
struct iovec iov;
msg.msg_control = (char*)ctrl;
msg.msg_controllen = sizeof(ctrl);
msg.msg_name = &cfg->ifaddr;
msg.msg_namelen = sizeof(cfg->ifaddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
iov.iov_base = buff;
iov.iov_len = BUFFER_SIZE;
struct timeval time_user, time_kernel;
struct cmsghdr *cmsg = (struct cmsghdr*)&ctrl;
const int64_t read_bytes = recvmsg(igmp_socket_fd, &msg, 0);
if(read_bytes == -1) {
return;
}
gettimeofday(&time_user, NULL);
if(cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_TIMESTAMP) {
memcpy(&time_kernel, CMSG_DATA(cmsg), sizeof(struct timeval));
}
if(verify_rtp(cfg, st, read_bytes, buff)) {
const double timediff = (time_user.tv_sec - time_kernel.tv_sec) * 1000000 + (time_user.tv_usec - time_kernel.tv_usec);
if(timediff > st->stats.latency_us) {
st->stats.latency_us = timediff;
}
}
}
int main(...) {
....
while(!is_global_exit) {
read_packets_recvmsg_with_latency(&cfg, &st, buffer, igmp_socket_fd);
}
}
That's pretty much it.
Please not that in the described use case where I start to lose packets I don't use SHARED UMEM, it's just a single RX-Queue receiving a multicast-stream. In case I process a smaller multicast-stream of around 150.000 pps - the AF-XDP solution doesn't lose any packets. But it is also the other way around - for around 520.000 pps on the same RX-Queue (using SHARED UMEM) I get a loss of 12.000 pps.
Any ideas what I am missing?

C Program daemon uses 100% cpu usage

I'm initializing a daemon in C in a Debian:
/**
* Initializes the daemon so that mcu.serial would listen in the background
*/
void init_daemon()
{
pid_t process_id = 0;
pid_t sid = 0;
// Create child process
process_id = fork();
// Indication of fork() failure
if (process_id < 0) {
printf("Fork failed!\n");
logger("Fork failed", LOG_LEVEL_ERROR);
exit(1);
}
// PARENT PROCESS. Need to kill it.
if (process_id > 0) {
printf("process_id of child process %i\n", process_id);
exit(0);
}
//unmask the file mode
umask(0);
//set new session
sid = setsid();
if(sid < 0) {
printf("could not set new session");
logger("could not set new session", LOG_LEVEL_ERROR);
exit(1);
}
// Close stdin. stdout and stderr
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);
}
The main daemon runs in the background and monitors a serial port to communicate with a microcontroller - it reads peripherals (such as button presses) and passes information to it. The main functional loop is
int main(int argc, char *argv[])
{
// We need the port to listen to commands writing
if (argc < 2) {
fprintf(stderr,"ERROR, no port provided\n");
logger("ERROR, no port provided", LOG_LEVEL_ERROR);
exit(1);
}
int portno = atoi(argv[1]);
// Initialize serial port
init_serial();
// Initialize server for listening to socket
init_server(portno);
// Initialize daemon and run the process in the background
init_daemon();
// Timeout for reading socket
fd_set setSerial, setSocket;
struct timeval timeout;
timeout.tv_sec = 0;
timeout.tv_usec = 10000;
char bufferWrite[BUFFER_WRITE_SIZE];
char bufferRead[BUFFER_READ_SIZE];
int n;
int sleep;
int newsockfd;
while (1)
{
// Reset parameters
bzero(bufferWrite, BUFFER_WRITE_SIZE);
bzero(bufferRead, BUFFER_WRITE_SIZE);
FD_ZERO(&setSerial);
FD_SET(fserial, &setSerial);
FD_ZERO(&setSocket);
FD_SET(sockfd, &setSocket);
// Start listening to socket for commands
listen(sockfd,5);
clilen = sizeof(cli_addr);
// Wait for command but timeout
n = select(sockfd + 1, &setSocket, NULL, NULL, &timeout);
if (n == -1) {
// Error. Handled below
}
// This is for READING button
else if (n == 0) {
// This timeout is okay
// This allows us to read the button press as well
// Now read the response, but timeout if nothing returned
n = select(fserial + 1, &setSerial, NULL, NULL, &timeout);
if (n == -1) {
// Error. Handled below
} else if (n == 0) {
// timeout
// This is an okay tiemout; i.e. nothing has happened
} else {
n = read(fserial, bufferRead, sizeof bufferRead);
if (n > 0) {
logger(bufferRead, LOG_LEVEL_INFO);
if (strcmp(stripNewLine(bufferRead), "ev b2") == 0) {
//logger("Shutting down now", LOG_LEVEL_INFO);
system("shutdown -h now");
}
} else {
logger("Could not read button press", LOG_LEVEL_WARN);
}
}
}
// This is for WRITING COMMANDS
else {
// Now read the command
newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen);
if (newsockfd < 0 || n < 0) logger("Could not accept socket port", LOG_LEVEL_ERROR);
// Now read the command
n = read(newsockfd, bufferWrite, BUFFER_WRITE_SIZE);
if (n < 0) {
logger("Could not read command from socket port", LOG_LEVEL_ERROR);
} else {
//logger(bufferWrite, LOG_LEVEL_INFO);
}
// Write the command to the serial
write(fserial, bufferWrite, strlen(bufferWrite));
sleep = 200 * strlen(bufferWrite) - timeout.tv_usec; // Sleep 200uS/byte
if (sleep > 0) usleep(sleep);
// Now read the response, but timeout if nothing returned
n = select(fserial + 1, &setSerial, NULL, NULL, &timeout);
if (n == -1) {
// Error. Handled below
} else if (n == 0) {
// timeout
sprintf(bufferRead, "err\r\n");
logger("Did not receive response from MCU", LOG_LEVEL_WARN);
} else {
n = read(fserial, bufferRead, sizeof bufferRead);
}
// Error reading from the socket
if (n < 0) {
logger("Could not read response from serial port", LOG_LEVEL_ERROR);
} else {
//logger(bufferRead, LOG_LEVEL_INFO);
}
// Send MCU response to client
n = write(newsockfd, bufferRead, strlen(bufferRead));
if (n < 0) logger("Could not write confirmation to socket port", LOG_LEVEL_ERROR);
}
close(newsockfd);
}
close(sockfd);
return 0;
}
But the CPU usages is always at 100%. Why is that? What can I do?
EDIT
I commented out the entire while loop and made the main function as simple as:
int main(int argc, char *argv[])
{
init_daemon();
while(1) {
// All commented out
}
return 0;
}
And I'm still getting 100% cpu usage
You need to set timeout to the wanted value on every iteration, the struct gets modified on Linux so I think your loop is not pausing except for the first time, i.e. select() is only blocking the very first time.
Try to print tv_sec and tv_usec after select() and see, it's modified to reflect how much time was left before select() returned.
Move this part
timeout.tv_sec = 0;
timeout.tv_usec = 10000;
inside the loop before the select() call and it should work as you expect it to, you can move many delcarations inside the loop too, that would make your code easier to maintan, you could for example move the loop content to a function in the future and that might help.
This is from the linux manual page select(2)
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 permits either behavior.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns.
I think the bold part in the qoute is the important one.

Socket performance

I just wondered about how Instant Messengers and Online Games can accept and deliver messages so fast. (Network programming with sockets)
I read about that this is done with nonblocking sockets.
I tried blocking sockets with pthreads (each client gets its own thread) and nonblocking sockets with kqueue.Then I profiled both servers with a program which made 99 connections (each connection in one thread) and then writes some garbage to it (with a sleep of 1 second). When all threads are set up, I measured in the main thread how long it took to get a connection from the server (with wall clock time) (while "99 users" are writing to it).
threads (avg): 0.000350 // only small difference to kqueue
kqueue (avg): 0.000300 // and this is not even stable (client side)
The problem is, while testing with kqueue I got multiple times a SIGPIPE error (client-side). (With a little timeout usleep(50) this error was fixed). I think this is really bad because a server should be capable to handle thousands of connections. (Or is it my fault on the client side?) The crazy thing about this is the infamous pthread approach did just fine (with and without timeout).
So my question is: how can you build a stable socket server in C which can handle thousands of clients "asynchronously"? I only see the threads approach as a good thing, but this is considered bad practice.
Greetings
EDIT:
My test code:
double get_wall_time(){
struct timeval time;
if (gettimeofday(&time,NULL)){
// Handle error
return 0;
}
return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
#define NTHREADS 100
volatile unsigned n_threads = 0;
volatile unsigned n_writes = 0;
pthread_mutex_t main_ready;
pthread_mutex_t stop_mtx;
volatile bool running = true;
void stop(void)
{
pthread_mutex_lock(&stop_mtx);
running = false;
pthread_mutex_unlock(&stop_mtx);
}
bool shouldRun(void)
{
bool copy;
pthread_mutex_lock(&stop_mtx);
copy = running;
pthread_mutex_unlock(&stop_mtx);
return copy;
}
#define TARGET_HOST "localhost"
#define TARGET_PORT "1336"
void *thread(void *args)
{
char tmp = 0x01;
if (__sync_add_and_fetch(&n_threads, 1) == NTHREADS) {
pthread_mutex_unlock(&main_ready);
fprintf(stderr, "All %u Threads are ready...\n", (unsigned)n_threads);
}
int fd = socket(res->ai_family, SOCK_STREAM, res->ai_protocol);
if (connect(fd, res->ai_addr, res->ai_addrlen) != 0) {
socket_close(fd);
fd = -1;
}
if (fd <= 0) {
fprintf(stderr, "socket_create failed\n");
}
if (write(fd, &tmp, 1) <= 0) {
fprintf(stderr, "pre-write failed\n");
}
do {
/* Write some garbage */
if (write(fd, &tmp, 1) <= 0) {
fprintf(stderr, "in-write failed\n");
break;
}
__sync_add_and_fetch(&n_writes, 1);
/* Wait some time */
usleep(500);
} while (shouldRun());
socket_close(fd);
return NULL;
}
int main(int argc, const char * argv[])
{
pthread_t threads[NTHREADS];
pthread_mutex_init(&main_ready, NULL);
pthread_mutex_lock(&main_ready);
pthread_mutex_init(&stop_mtx, NULL);
bzero((char *)&hint, sizeof(hint));
hint.ai_socktype = SOCK_STREAM;
hint.ai_family = AF_INET;
if (getaddrinfo(TARGET_HOST, TARGET_PORT, &hint, &res) != 0) {
return -1;
}
for (int i = 0; i < NTHREADS; ++i) {
pthread_create(&threads[i], NULL, thread, NULL);
}
/* wait for all threads to be set up */
pthread_mutex_lock(&main_ready);
fprintf(stderr, "Main thread is ready...\n");
{
double start, end;
int fd;
start = get_wall_time();
fd = socket(res->ai_family, SOCK_STREAM, res->ai_protocol);
if (connect(fd, res->ai_addr, res->ai_addrlen) != 0) {
socket_close(fd);
fd = -1;
}
end = get_wall_time();
if (fd > 0) {
fprintf(stderr, "Took %f ms\n", (end - start) * 1000);
socket_close(fd);
}
}
/* Stop all running threads */
stop();
/* Waiting for termination */
for (int i = 0; i < NTHREADS; ++i) {
pthread_join(threads[i], NULL);
}
fprintf(stderr, "Performed %u successfull writes\n", (unsigned)n_writes);
/* Lol.. */
freeaddrinfo(res);
return 0;
}
SIGPIPE comes when I try to connect to the kqueue server (after 10 connections are made, the server is "stuck"?). And when too many users are writing stuff, the server cannot open a new connection. (kqueue server code from http://eradman.com/posts/kqueue-tcp.html)
SIGPIPE means you're trying to write to a socket (or pipe) where the other end has already been closed (so noone will be able to read it). If you don't care about that, you can ignore SIGPIPE signals (call signal(SIGPIPE, SIG_IGN)) and the signals won't be a problem. Of course the write (or send) calls on the sockets will still be failing (with EPIPE), so you need to make you code robust enough to deal with that.
The reason that SIGPIPE normally kills the process is that its too easy to write programs that ignore errors on write/send calls and run amok using up 100% of CPU time otherwise. As long as you carefully always check for errors and deal with them, you can safely ignore SIGPIPEs
Or is it my fault?
It was your fault. TCP works. Most probably you didn't read all the data that was sent.
And when too many users are writing stuff, the server cannot open a new connection
Servers don't open connections. Clients open connections. Servers accept connections. If your server stops doing that, there something wrong with your accept loop. It should only do two things: accept a connection, and start a thread.

C - Using poll to multiplex between socket(s) and stdin - Server

I'm writing a client server application and I'm using poll to multiplex between several client sockets and stdin, where I can insert commands (example: stop the server). I believe the structure (the "logic") of my code is correct, however it's not behaving the way I expect it to:
struct pollfd pfd[NSERVER]; //defined as 10
pfd[0].fd = fileno(stdin);
pfd[0].events = POLLIN;
pfd[1].fd = socktfd; //server bind, listen socket
pfd[1].events = POLLIN;
struct sockaddr_storage remoteaddr; // client address
socklen_t addrlen;
char remoteIP[INET6_ADDRSTRLEN];
addrlen = sizeof remoteaddr;
char buf[1024]; // buffer
int pos=2;
while(poll(pfd,1,0) >= 0)
{
if(pfd[0].revents & POLLIN) { //stdin
//process input and perform command
}
if(pfd[1].revents & POLLIN) {
/* new connection */
int connsockfd = accept(socktfd, (struct sockaddr *)&remoteaddr,&addrlen);
pfd[pos].fd=connsockfd;
}
int i=2;
//Loop through the fd in pfd for events
while (i<=NSERVER)
{
if (pfd[i].revents & POLLIN) {
int c=recv(pfd[i].fd, buf, sizeof buf, 0);
if(c<=0) {
if (c==0)
{
/* Client closed socket */
close(pfd[i].fd);
}
}else
{//Client sent some data
c=send(pfd[i].fd,sbuff,z,0);
if (c<=0)
{
Error;
}
free(sbuff);
}
}
i++;
}
}
I've removed some code inside the recv and send to make the code easier to read.
It fails to behave (it just hangs, doesn't accept connections or reacts to input from stdin).
Note: I would prefer to use poll over select, so please don't point to select :-).
Thanks in advance for any assistance.
you should set every pfd[i].fd = -1, so they get ignored initially by poll().
poll(pfd, 1, 0) is wrong and should at least be poll(pfd, 2, 0) or even poll(pfd, NSERVER, 0).
while(i<=NSERVER) should be while(i<NSERVER)
Your program probably hangs, because you loop through the pfd array, which is not initialized and containes random values for .fd and .revents, so it wants to send() or recv() on some random FD which might block. Do if(pdf[i].fd < 0) {i++; continue;} in the i<NSERVER loop.
You also don't set pfd[pos].events = POLLIN on newly accepted sockets. Don't set POLLOUT unless you have something to send, because it will trigger almost every time.

Why is this socket/ file descriptor assignment invalid?

I'm trying to write a simple server in c that plays a two player game. It checks for incoming connections, and if there is no player1, it saves player1's file descriptor (to be used later for sending and receiving) and if there is no player2, it does the same. I have this loop set up that I modified from Here. My problem is that I want to receive from one, and send to the other, but it seems that my assignments are invalid. When I try to send to player2, it fails or it sends garbage. Sometimes, sending to player1 sends back to the server(?). Am I using select correctly and looping through the file descriptor set correctly? Any feedback would be appreciated.
// add the listener to the master set
FD_SET(listener, &master);
// keep track of the biggest file descriptor
fdmax = listener; // so far, it's this one
// main loop
while (1) {
read_fds = master; // copy it
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
error("select");
}
// run through the existing connections looking for data to read
for(i = 0; i <= fdmax; i++) {
//This indicates that someone is trying to do something
if (FD_ISSET(i, &read_fds)) {
if (i == listener) {
addrlen = sizeof remoteaddr;
newfd = accept(listener, (struct sockaddr *)&remoteaddr, &addrlen);
if (newfd == -1) {
error("accept");
} else {
FD_SET(newfd, &master);
if (newfd > fdmax) {
fdmax = newfd;
}
/* If we have the maximum number of players, we tell if that it's busy */
if (players >= 2) {
toobusy(fdmax); close(fdmax); FD_CLR(fdmax, &master);
} else {
//Problem here?
if (player1_fd == -1) {
player1_fd = newfd;
}
if ((player1_fd != -1) && (player2_fd == -1)) {
player2_fd = newfd;
}
players++;
if (players == 2) {
sendhandles(); //says two players exist
}
}
}
} else {
//Possible problems here
if (i == player1_fd || i == player2_fd) {
receive(i); //Processes the messages
}
}
}
}
}
The toobusy part should use newfd, not fdmax. Otherwise there's no easy spotted error in this code.
Your comment "Sometimes, sending to player1 sends back to the server(?)" makes me think that player1_fd and player2_fd might be uninitialized or perhaps initialized to 0 instead of -1. You should double check that you set them to -1 before the loop.
A few additionally notes:
Are you sure master is 0 initialized? Have you called FD_ZERO on it?
You should use FD_COPY to copy master to read_fds.
Finally, I'd recommend to use a library for event handling, such as libevent or libev.

Resources