I have two nodes communicating with a socket. Each node has a read thread and a write thread to communicate with the other. Given below is the code for the read thread. The communication works fine between the two nodes with that code. But I am trying to add a select function in this thread and that is giving me problems (the code for select is in the comments. I just uncomment it to add the functionality). The problem is one node does not receive messages and only does the timeout. The other node gets the messages from the other node but never timesout. That problem is not there (both nodes send and receive messages) without the select (keeping the comments /* */).
Can anyone point out what the problem might be? Thanks.
void *Read_Thread(void *arg_passed)
{
int numbytes;
unsigned char *buf;
buf = (unsigned char *)malloc(MAXDATASIZE);
/*
fd_set master;
int fdmax;
FD_ZERO(&master);
*/
struct RWThread_args_template *my_args = (struct RWThread_args_template *)arg_passed;
/*
FD_SET(my_args->new_fd, &master);
struct timeval tv;
tv.tv_sec = 2;
tv.tv_usec = 0;
int s_rv = 0;
fdmax = my_args->new_fd;
*/
while(1)
{
/*
s_rv = -1;
if((s_rv = select(fdmax+1, &master, NULL, NULL, &tv)) == -1)
{
perror("select");
exit(1);
}
if(s_rv == 0)
{
printf("Read: Timed out\n");
continue;
}
else
{
printf("Read: Received msg\n");
}
*/
if( (numbytes = recv(my_args->new_fd, buf, MAXDATASIZE-1, 0)) == -1 )
{
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Read: received '%s'\n", buf);
}
pthread_exit(NULL);
}
You must set up master and tv before each call to select(), within the loop. They are both modified by the select() call.
In particular, if select() returned 0, then master will now be empty.
Related
I have a small problem, in practice I have to let two clients communicate (which perform different functions), with my concurrent server,
I discovered that I can solve this using the select, but if I try to implement it in the code it gives me a segmentation error, could someone help me kindly?
I state that before with a single client was a fable, now unfortunately implementing the select, I spoiled a bit 'all,
I should fix this thing, you can make a concurrent server with select ()?
can you tell me where I'm wrong with this code?
int main (int argc , char *argv[])
{
int list_fd,conn_fd;
int i,j;
struct sockaddr_in serv_add,client;
char buffer [1024];
socklen_t len;
time_t timeval;
char fd_open[FD_SETSIZE];
pid_t pid;
int logging = 1;
char swi;
fd_set fset;
int max_fd = 0;
int waiting = 0;
int compat = 0;
sqlite3 *db;
sqlite3_open("Prova.db", &db);
start2();
start3();
printf("ServerREP Avviato \n");
if ( ( list_fd = socket(AF_INET, SOCK_STREAM, 0) ) < 0 ) {
perror("socket");
exit(1);
}
if (setsockopt(list_fd, SOL_SOCKET, SO_REUSEADDR, &(int){ 1 }, sizeof(int)) < 0)
perror("setsockopt(SO_REUSEADDR) failed");
memset((void *)&serv_add, 0, sizeof(serv_add)); /* clear server address */
serv_add.sin_family = AF_INET;
serv_add.sin_port = htons(SERVERS_PORT2);
serv_add.sin_addr.s_addr = inet_addr(SERVERS_IP2);
if ( bind(list_fd, (struct sockaddr *) &serv_add, sizeof(serv_add)) < 0 ) {
perror("bind");
exit(1);
}
if ( listen(list_fd, 1024) < 0 ) {
perror("listen");
exit(1);
}
/* initialize all needed variables */
memset(fd_open, 0, FD_SETSIZE); /* clear array of open files */
max_fd = list_fd; /* maximum now is listening socket */
fd_open[max_fd] = 1;
//max_fd = max(conn_fd, sockMED);
while (1) {
FD_ZERO(&fset);
FD_SET(conn_fd, &fset);
FD_SET(sockMED, &fset);
len = sizeof(client);
if(select(max_fd + 1, &fset, NULL, NULL, NULL) < 0){exit(1);}
if(FD_ISSET(conn_fd, &fset))
{
if ( (conn_fd = accept(list_fd, (struct sockaddr *)&client, &len)) <0 )
perror("accept error");
exit(-1);
}
/* fork to handle connection */
if ( (pid = fork()) < 0 ){
perror("fork error");
exit(-1);
}
if (pid == 0) { /* child */
close(list_fd);
close(sockMED);
Menu_2(db,conn_fd);
close(conn_fd);
exit(0);
} else { /* parent */
close(conn_fd);
}
if(FD_ISSET(sockMED, &fset))
MenuMED(db,sockMED);
FD_CLR(conn_fd, &fset);
FD_CLR(sockMED, &fset);
}
sqlite3_close(db);
exit(0);
}
I cannot understand how you are trying to use select here, and why you want to use both fork to let a child handle the accepted connection socket, and select.
Common designs are:
multi processing server:
The parent process setups the listening socket and loops on waiting actual connections with accept. Then it forks a child to process the newly accepted connection and simple waits for next one.
multi threaded server:
A variant of previous one. The master thread starts a new thread to process the newly accepted connection instead of forking a new process.
asynchronous server:
The server setups a fd_set to know which sockets require processing. Initially, only the listening socket is set. Then the main loop is (in pseudo code:
loop on select
if the listening socket is present in read ready sockets, accept the pending connection and add is to the `fd_set`, then return to loop
if another socket is present in read ready socket
read from it
if a zero read (closed by peer), close the socket and remove it from the `fd_set`
else process the request and return to loop
The hard part here is that is processing takes a long time, the whole process is blocked, and it processing involves sending a lot of data, you will have to use select for the sending part too...
I have a little doubt with an excercise that i have found to train my ability with sockets.
The exercise says:
Write pseudocode of a tcp server based on fork()
Constraints:
- Max 20000 simultaneusly active connections; after this limit new connection are dropped
- At most 1000 request per client(ip) per hour
I've sketched a solution and i want to know if's it's a good way to go:
struct client{
int ip;
timestamp to;
int n_req;
client* next;
}
void serve(int c_fd, int ip, client* list){
client* c = find_in_list(list, ip);
timestamp now = gettimeofday();
if(now.tv_sec - c->to.tv_sec > (60*60)){
// ig one hour is passed is possible to reset counter
c->n_req = 0;
c->to = now;
}
if(c->n_req > 1000){
/*do_nothing */
} else {
n_req++;
/*
do stuff
*/
}
exit();
}
int main(){
client* list = NULL;
a_fd = socket(AF_INET);
bind(a_fd);
listen(a_fd);
while(1){
/*inizilize poll*/
n_ready = poll();
if(n_ready > 0){
for(/*each ready file descriptor*/){
c_fd = accept(a_fd, this_sockaddr);
if(/* if the ip in this_sockaddr is new*/){
client* new = /*create neew client */;
add_list(list, new);
}
if(served <= 20000){
served++;
pid = fork();
if(pid == 0 ){ //CHILD
serve(c_fd, ip, list);
close(c_fd)
} else { //FATHER
close(c_fd);
do{
pid = blocking_wait();
served--;
} while(pid != 0)
}
} else {
close(c_fd);
}
}
}
}
}
Thanks for any advice.
If you want to see a simple, yet fully effective example of a server using fork, I would recommend you review this simple little project:
ftp://ftp.cs.umass.edu/pub/net/pub/kurose/ftpserver.c
if (listen(sockid,5) < 0)
{ printf("server: listen error :%d\n",errno);exit(0);}
while(1==1) {
/* ACCEPT A CONNECTION AND THEN CREATE A CHILD TO DO THE WORK */
/* LOOP BACK AND WAIT FOR ANOTHER CONNECTION */
printf("server: starting accept\n");
if ((newsd = accept(sockid ,(struct sockaddr *) &client_addr,
&clilen)) < 0)
{printf("server: accept error :%d\n", errno); exit(0); }
printf("server: return from accept, socket for this ftp: %d\n",
newsd);
if ( (pid=fork()) == 0) {
/* CHILD PROC STARTS HERE. IT WILL DO ACTUAL FILE TRANSFER */
close(sockid); /* child shouldn't do an accept */
doftp(newsd);
close (newsd);
exit(0); /* child all done with work */
}
/* PARENT CONTINUES BELOW HERE */
close(newsd); /* parent all done with client, only child */
} /* will communicate with that client from now on */
Child fork looks like this:
/* CHILD PROCEDURE, WHICH ACTUALLY DOES THE FILE TRANSFER */
doftp(int newsd)
{
int i,fsize,fd,msg_ok,fail,fail1,req,c,ack;
int no_read ,num_blks , num_blks1,num_last_blk,num_last_blk1,tmp;
char fname[MAXLINE];
char out_buf[MAXSIZE];
FILE *fp;
no_read = 0;
num_blks = 0;
num_last_blk = 0;
/* START SERVICING THE CLIENT */
/* get command code from client.*/
/* only one supported command: 100 - get a file */
req = 0;
if((readn(newsd,(char *)&req,sizeof(req))) < 0)
{printf("server: read error %d\n",errno);exit(0);}
req = ntohs(req);
...
Here is the code i am using. Whenever i write something to the Stdin, it works, but it is not working for socket. It's not able to enter the loop for Socket. I am new to socket programming.
void HandleConnection(int socket)
{
fd_set rfd;
struct timeval tv;
int retval;
printf("%d",socket);
MakeNonBlocking(socket);
/* Watch stdin (fd 0) to see when it has input. */
FD_ZERO(&rfd);
while(1)
{
FD_SET(STDIN, &rfd);
FD_SET(socket, &rfd);
/* Wait up to five seconds. */
tv.tv_sec = 50;
tv.tv_usec = 0;
retval = select(2, &rfd,NULL, NULL, &tv);
if(retval == 0)
{
printf("No data within fifty seconds.\n");
exit(1);
}
if(FD_ISSET(socket,&rfd))
{
printf("socket wala\n");
recieve_message(&socket);
send_message(&socket);
}
if(FD_ISSET(STDIN,&rfd))
{
printf("stdin wala\n");
recieve_message(&socket);
send_message(&socket);
}
}
}
FDZERO must go before FDSET inside the loop
select(2, ...) should be select(highest filedescriptor +1, ...).
when select returns you should check for negative value in case of errors
you should consider using pselect instead of select.
Clear tv before you reinitialize it.
It appears that you don't understand how the nfds argument to select() is used. The man page addresses this explicitly:
The first nfds
descriptors are checked in each set; i.e., the descriptors from 0 through nfds-1 in the descriptor sets are examined. (Example:
If you have set two file descriptors "4" and "17", nfds should not be "2", but rather "17 + 1" or "18".)
So here is how you should rewrite your code.
int maxfd = (socket > STDIN ? socket : STDIN) + 1; /* select() requires the number of FDs to scan, which is max(fds)+1 */
while(1){
FD_ZERO(&rfd); /* This needs to be done each time through the loop */
/* Watch stdin (fd 0) to see when it has input. */
FD_SET(STDIN, &rfd);
FD_SET(socket, &rfd);
/* Wait up to five seconds. */
tv.tv_sec = 50;
tv.tv_usec = 0;
retval = select(maxfd, &rfd,NULL, NULL, &tv);
if(retval == 0)
{
printf("No data within fifty seconds.\n");
exit(1);
}
if(retval == -1) /* Check for error */
{
perror("Error from select");
exit(2);
}
if(FD_ISSET(socket,&rfd))
{
printf("socket wala\n");
recieve_message(&socket);
send_message(&socket);
}
if(FD_ISSET(STDIN,&rfd))
{
printf("stdin wala\n");
recieve_message(&socket);
send_message(&socket);
}
}
Background
I am currently working on a Windows select-like function that not only supports SOCKET handles, but also other kinds of waitable handles. My goal is to wait on standard console handles in order to provide select-functionality to the curl testsuite.
The related program can be found in the curl git repository: sockfilt.c
Question
Is it possible to wait for data to be available on a non-GUI-based console input? The issue is that WaitFor* methods do not support PIPE handles and therefore STDIN is not supported if the process input is fed from another process, e.g. using the pipe | functionality of cmd.
The following example program illustrates the problem: select_ws.c
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <windows.h>
#include <winsock2.h>
#include <malloc.h>
#include <conio.h>
#include <fcntl.h>
#define SET_SOCKERRNO(x) (WSASetLastError((int)(x)))
typedef SOCKET curl_socket_t;
/*
* select function with support for WINSOCK2 sockets and all
* other handle types supported by WaitForMultipleObjectsEx.
* http://msdn.microsoft.com/en-us/library/windows/desktop/ms687028.aspx
* http://msdn.microsoft.com/en-us/library/windows/desktop/ms741572.aspx
*/
static int select_ws(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout)
{
long networkevents;
DWORD milliseconds, wait, idx, avail, events, inputs;
WSAEVENT wsaevent, *wsaevents;
WSANETWORKEVENTS wsanetevents;
INPUT_RECORD *inputrecords;
HANDLE handle, *handles;
curl_socket_t sock, *fdarr, *wsasocks;
int error, fds;
DWORD nfd = 0, wsa = 0;
int ret = 0;
/* check if the input value is valid */
if(nfds < 0) {
SET_SOCKERRNO(EINVAL);
return -1;
}
/* check if we got descriptors, sleep in case we got none */
if(!nfds) {
Sleep((timeout->tv_sec * 1000) + (timeout->tv_usec / 1000));
return 0;
}
/* allocate internal array for the original input handles */
fdarr = malloc(nfds * sizeof(curl_socket_t));
if(fdarr == NULL) {
SET_SOCKERRNO(ENOMEM);
return -1;
}
/* allocate internal array for the internal event handles */
handles = malloc(nfds * sizeof(HANDLE));
if(handles == NULL) {
SET_SOCKERRNO(ENOMEM);
return -1;
}
/* allocate internal array for the internal socket handles */
wsasocks = malloc(nfds * sizeof(curl_socket_t));
if(wsasocks == NULL) {
SET_SOCKERRNO(ENOMEM);
return -1;
}
/* allocate internal array for the internal WINSOCK2 events */
wsaevents = malloc(nfds * sizeof(WSAEVENT));
if(wsaevents == NULL) {
SET_SOCKERRNO(ENOMEM);
return -1;
}
/* loop over the handles in the input descriptor sets */
for(fds = 0; fds < nfds; fds++) {
networkevents = 0;
handles[nfd] = 0;
if(FD_ISSET(fds, readfds))
networkevents |= FD_READ|FD_ACCEPT|FD_CLOSE;
if(FD_ISSET(fds, writefds))
networkevents |= FD_WRITE|FD_CONNECT;
if(FD_ISSET(fds, exceptfds))
networkevents |= FD_OOB;
/* only wait for events for which we actually care */
if(networkevents) {
fdarr[nfd] = (curl_socket_t)fds;
if(fds == fileno(stdin)) {
handles[nfd] = GetStdHandle(STD_INPUT_HANDLE);
}
else if(fds == fileno(stdout)) {
handles[nfd] = GetStdHandle(STD_OUTPUT_HANDLE);
}
else if(fds == fileno(stderr)) {
handles[nfd] = GetStdHandle(STD_ERROR_HANDLE);
}
else {
wsaevent = WSACreateEvent();
if(wsaevent != WSA_INVALID_EVENT) {
error = WSAEventSelect(fds, wsaevent, networkevents);
if(error != SOCKET_ERROR) {
handles[nfd] = wsaevent;
wsasocks[wsa] = (curl_socket_t)fds;
wsaevents[wsa] = wsaevent;
wsa++;
}
else {
handles[nfd] = (HANDLE)fds;
WSACloseEvent(wsaevent);
}
}
}
nfd++;
}
}
/* convert struct timeval to milliseconds */
if(timeout) {
milliseconds = ((timeout->tv_sec * 1000) + (timeout->tv_usec / 1000));
}
else {
milliseconds = INFINITE;
}
/* wait for one of the internal handles to trigger */
wait = WaitForMultipleObjectsEx(nfd, handles, FALSE, milliseconds, FALSE);
/* loop over the internal handles returned in the descriptors */
for(idx = 0; idx < nfd; idx++) {
fds = fdarr[idx];
handle = handles[idx];
sock = (curl_socket_t)fds;
/* check if the current internal handle was triggered */
if(wait != WAIT_FAILED && (wait - WAIT_OBJECT_0) >= idx &&
WaitForSingleObjectEx(handle, 0, FALSE) == WAIT_OBJECT_0) {
/* try to handle the event with STD* handle functions */
if(fds == fileno(stdin)) {
/* check if there is no data in the input buffer */
if(!stdin->_cnt) {
/* check if we are getting data from a PIPE */
if(!GetConsoleMode(handle, &avail)) {
/* check if there is no data from PIPE input */
if(!PeekNamedPipe(handle, NULL, 0, NULL, &avail, NULL))
avail = 0;
if(!avail)
FD_CLR(sock, readfds);
} /* check if there is no data from keyboard input */
else if (!_kbhit()) {
/* check if there are INPUT_RECORDs in the input buffer */
if(GetNumberOfConsoleInputEvents(handle, &events)) {
if(events > 0) {
/* remove INPUT_RECORDs from the input buffer */
inputrecords = (INPUT_RECORD*)malloc(events *
sizeof(INPUT_RECORD));
if(inputrecords) {
if(!ReadConsoleInput(handle, inputrecords,
events, &inputs))
inputs = 0;
free(inputrecords);
}
/* check if we got all inputs, otherwise clear buffer */
if(events != inputs)
FlushConsoleInputBuffer(handle);
}
}
/* remove from descriptor set since there is no real data */
FD_CLR(sock, readfds);
}
}
/* stdin is never ready for write or exceptional */
FD_CLR(sock, writefds);
FD_CLR(sock, exceptfds);
}
else if(fds == fileno(stdout) || fds == fileno(stderr)) {
/* stdout and stderr are never ready for read or exceptional */
FD_CLR(sock, readfds);
FD_CLR(sock, exceptfds);
}
else {
/* try to handle the event with the WINSOCK2 functions */
error = WSAEnumNetworkEvents(fds, NULL, &wsanetevents);
if(error != SOCKET_ERROR) {
/* remove from descriptor set if not ready for read/accept/close */
if(!(wsanetevents.lNetworkEvents & (FD_READ|FD_ACCEPT|FD_CLOSE)))
FD_CLR(sock, readfds);
/* remove from descriptor set if not ready for write/connect */
if(!(wsanetevents.lNetworkEvents & (FD_WRITE|FD_CONNECT)))
FD_CLR(sock, writefds);
/* remove from descriptor set if not exceptional */
if(!(wsanetevents.lNetworkEvents & FD_OOB))
FD_CLR(sock, exceptfds);
}
}
/* check if the event has not been filtered using specific tests */
if(FD_ISSET(sock, readfds) || FD_ISSET(sock, writefds) ||
FD_ISSET(sock, exceptfds)) {
ret++;
}
}
else {
/* remove from all descriptor sets since this handle did not trigger */
FD_CLR(sock, readfds);
FD_CLR(sock, writefds);
FD_CLR(sock, exceptfds);
}
}
for(idx = 0; idx < wsa; idx++) {
WSAEventSelect(wsasocks[idx], NULL, 0);
WSACloseEvent(wsaevents[idx]);
}
free(wsaevents);
free(wsasocks);
free(handles);
free(fdarr);
return ret;
}
int main(void)
{
WORD wVersionRequested;
WSADATA wsaData;
SOCKET sock[4];
struct sockaddr_in sockaddr[4];
fd_set readfds;
fd_set writefds;
fd_set exceptfds;
SOCKET maxfd = 0;
int selfd = 0;
void *buffer = malloc(1024);
ssize_t nread;
setmode(fileno(stdin), O_BINARY);
wVersionRequested = MAKEWORD(2, 2);
WSAStartup(wVersionRequested, &wsaData);
sock[0] = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr[0].sin_family = AF_INET;
sockaddr[0].sin_addr.s_addr = inet_addr("74.125.134.26");
sockaddr[0].sin_port = htons(25);
connect(sock[0], (struct sockaddr *) &sockaddr[0], sizeof(sockaddr[0]));
sock[1] = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr[1].sin_family = AF_INET;
sockaddr[1].sin_addr.s_addr = inet_addr("74.125.134.27");
sockaddr[1].sin_port = htons(25);
connect(sock[1], (struct sockaddr *) &sockaddr[1], sizeof(sockaddr[1]));
sock[2] = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
sockaddr[2].sin_family = AF_INET;
sockaddr[2].sin_addr.s_addr = inet_addr("127.0.0.1");
sockaddr[2].sin_port = htons(1337);
printf("bind = %d\n", bind(sock[2], (struct sockaddr *) &sockaddr[2], sizeof(sockaddr[2])));
printf("listen = %d\n", listen(sock[2], 5));
sock[3] = INVALID_SOCKET;
while(1) {
FD_ZERO(&readfds);
FD_ZERO(&writefds);
FD_ZERO(&exceptfds);
FD_SET(sock[0], &readfds);
FD_SET(sock[0], &exceptfds);
maxfd = maxfd > sock[0] ? maxfd : sock[0];
FD_SET(sock[1], &readfds);
FD_SET(sock[1], &exceptfds);
maxfd = maxfd > sock[1] ? maxfd : sock[1];
FD_SET(sock[2], &readfds);
FD_SET(sock[2], &exceptfds);
maxfd = maxfd > sock[2] ? maxfd : sock[2];
FD_SET((SOCKET)fileno(stdin), &readfds);
maxfd = maxfd > (SOCKET)fileno(stdin) ? maxfd : (SOCKET)fileno(stdin);
printf("maxfd = %d\n", maxfd);
selfd = select_ws(maxfd + 1, &readfds, &writefds, &exceptfds, NULL);
printf("selfd = %d\n", selfd);
if(FD_ISSET(sock[0], &readfds)) {
printf("read sock[0]\n");
nread = recv(sock[0], buffer, 1024, 0);
printf("read sock[0] = %d\n", nread);
}
if(FD_ISSET(sock[0], &exceptfds)) {
printf("exception sock[0]\n");
}
if(FD_ISSET(sock[1], &readfds)) {
printf("read sock[1]\n");
nread = recv(sock[1], buffer, 1024, 0);
printf("read sock[1] = %d\n", nread);
}
if(FD_ISSET(sock[1], &exceptfds)) {
printf("exception sock[1]\n");
}
if(FD_ISSET(sock[2], &readfds)) {
if(sock[3] != INVALID_SOCKET)
closesocket(sock[3]);
printf("accept sock[2] = %d\n", sock[2]);
nread = sizeof(sockaddr[3]);
printf("WSAGetLastError = %d\n", WSAGetLastError());
sock[3] = accept(sock[2], (struct sockaddr *) &sockaddr[3], &nread);
printf("WSAGetLastError = %d\n", WSAGetLastError());
printf("accept sock[2] = %d\n", sock[3]);
}
if(FD_ISSET(sock[2], &exceptfds)) {
printf("exception sock[2]\n");
}
if(FD_ISSET(fileno(stdin), &readfds)) {
printf("read fileno(stdin)\n");
nread = read(fileno(stdin), buffer, 1024);
printf("read fileno(stdin) = %d\n", nread);
}
}
WSACleanup();
free(buffer);
}
Compile using MinGW with the following command:
mingw32-gcc select_ws.c -Wl,-lws2_32 -g -o select_ws.exe
Running the program directly from the console using the following command works:
select_ws.exe
But doing the same with a pipe will constantly signal WaitForMultipleObjectsEx:
ping -t 8.8.8.8 | select_ws.exe
The pipe is ready to read until the parent process is finished, e.g.:
ping 8.8.8.8 | select_ws.exe
Is there a compatible way to simulate a blocking wait on the PIPE-based console input handle together with the other handles? The use of threads should be avoided.
You are welcome to contribute changes to the example program in this gist.
Thanks in advance!
I actually found a way to make it work using a separate waiting-thread. Please see the following commit in the curl repository on github.com.
Thanks for your comments!
Use GetStdHandle(STD_INPUT_HANDLE) to get the STDIN pipe handle, then use ReadFile/Ex() with an OVERLAPPED structure whose hEvent member is set to a manual-reset event from CreateEvent(). You can then use any of the WaitFor*() functions to wait on the event. If it times out, call CancelIo() to abort the read operation.
I have two daemons, and A is speaking to B. B is listening on a port, and A opens a tcp connection to that port. A is able to open a socket to B, but when it attempts to actually write said socket, I get a SIGPIPE, so I'm trying to figure out where B could be closing the open socket.
However, if I attach to both daemons in gdb, the SIGPIPE happens before any of the code for handling data is called. This kind of makes sense, because the initial write is never successful, and the listeners are triggered from receiving data. My question is - what could cause daemon B to close the socket before any data is sent? The socket is closed less than a microsecond after opening it, so I'm thinking it can't be a timeout or anything of the sort. I would love a laundry list of possibilities to track down, as I've been chewing on this one for a few days and I'm pretty much out of ideas.
As requested, here is the code that accepts and handles communication:
{
extern char *PAddrToString(pbs_net_t *);
int i;
int n;
time_t now;
fd_set *SelectSet = NULL;
int SelectSetSize = 0;
int MaxNumDescriptors = 0;
char id[] = "wait_request";
char tmpLine[1024];
struct timeval timeout;
long OrigState = 0;
if (SState != NULL)
OrigState = *SState;
timeout.tv_usec = 0;
timeout.tv_sec = waittime;
SelectSetSize = sizeof(char) * get_fdset_size();
SelectSet = (fd_set *)calloc(1,SelectSetSize);
pthread_mutex_lock(global_sock_read_mutex);
memcpy(SelectSet,GlobalSocketReadSet,SelectSetSize);
/* selset = readset;*/ /* readset is global */
MaxNumDescriptors = get_max_num_descriptors();
pthread_mutex_unlock(global_sock_read_mutex);
n = select(MaxNumDescriptors, SelectSet, (fd_set *)0, (fd_set *)0, &timeout);
if (n == -1)
{
if (errno == EINTR)
{
n = 0; /* interrupted, cycle around */
}
else
{
int i;
struct stat fbuf;
/* check all file descriptors to verify they are valid */
/* NOTE: selset may be modified by failed select() */
for (i = 0; i < MaxNumDescriptors; i++)
{
if (FD_ISSET(i, GlobalSocketReadSet) == 0)
continue;
if (fstat(i, &fbuf) == 0)
continue;
/* clean up SdList and bad sd... */
pthread_mutex_lock(global_sock_read_mutex);
FD_CLR(i, GlobalSocketReadSet);
pthread_mutex_unlock(global_sock_read_mutex);
} /* END for each socket in global read set */
free(SelectSet);
log_err(errno, id, "Unable to select sockets to read requests");
return(-1);
} /* END else (errno == EINTR) */
} /* END if (n == -1) */
for (i = 0; (i < max_connection) && (n != 0); i++)
{
pthread_mutex_lock(svr_conn[i].cn_mutex);
if (FD_ISSET(i, SelectSet))
{
/* this socket has data */
n--;
svr_conn[i].cn_lasttime = time(NULL);
if (svr_conn[i].cn_active != Idle)
{
void *(*func)(void *) = svr_conn[i].cn_func;
netcounter_incr();
pthread_mutex_unlock(svr_conn[i].cn_mutex);
func((void *)&i);
/* NOTE: breakout if state changed (probably received shutdown request) */
if ((SState != NULL) &&
(OrigState != *SState))
break;
}
else
{
pthread_mutex_lock(global_sock_read_mutex);
FD_CLR(i, GlobalSocketReadSet);
pthread_mutex_unlock(global_sock_read_mutex);
close_conn(i, TRUE);
pthread_mutex_unlock(svr_conn[i].cn_mutex);
pthread_mutex_lock(num_connections_mutex);
sprintf(tmpLine, "closed connections to fd %d - num_connections=%d (select bad socket)",
i,
num_connections);
pthread_mutex_unlock(num_connections_mutex);
log_err(-1, id, tmpLine);
}
}
else
pthread_mutex_unlock(svr_conn[i].cn_mutex);
} /* END for i */
/* NOTE: break out if shutdown request received */
if ((SState != NULL) && (OrigState != *SState))
return(0);
/* have any connections timed out ?? */
now = time((time_t *)0);
for (i = 0;i < max_connection;i++)
{
struct connection *cp;
pthread_mutex_lock(svr_conn[i].cn_mutex);
cp = &svr_conn[i];
if (cp->cn_active != FromClientDIS)
{
pthread_mutex_unlock(svr_conn[i].cn_mutex);
continue;
}
if ((now - cp->cn_lasttime) <= PBS_NET_MAXCONNECTIDLE)
{
pthread_mutex_unlock(svr_conn[i].cn_mutex);
continue;
}
if (cp->cn_authen & PBS_NET_CONN_NOTIMEOUT)
{
pthread_mutex_unlock(svr_conn[i].cn_mutex);
continue; /* do not time-out this connection */
}
/* NOTE: add info about node associated with connection - NYI */
snprintf(tmpLine, sizeof(tmpLine), "connection %d to host %s has timed out after %d seconds - closing stale connection\n",
i,
PAddrToString(&cp->cn_addr),
PBS_NET_MAXCONNECTIDLE);
log_err(-1, "wait_request", tmpLine);
/* locate node associated with interface, mark node as down until node responds */
/* NYI */
close_conn(i, TRUE);
pthread_mutex_unlock(svr_conn[i].cn_mutex);
} /* END for (i) */
return(0);
}
NOTE: I didn't write this code.
Is it possible you messed up and somewhere else in the program you try to close the same handle twice?
That could do this to you very easily.
HINT: systrace can determine if this is happening.