I'm having a problem with child processes hanging onto a socket after exec(). This process 1) reads udp packets, and 2) kills/starts other processes. This process monitors other processes via the udp packets that they send.
This runs on Windows, Linux, and AIX. I have not experienced any issues on AIX, only on Linux. (The Windows code is significantly different, so I won't go into details about that.)
I am setting the FD_CLOEXEC flag on the returned descriptor immediately after the creating it via fcntl(). This must run on Red Hat EL 4-6, so using O_CLOEXEC on creation is not an option (the kernels in RHEL4/5 do not have the option.)
For maintenance, the monitoring process may need to be restarted, and when I attempt to restart it, I find that occasionally one of the child processes is still bound to the socket, preventing the monitoring process from doing so. [Normally this wouldn't be an issue (because the user would see the restart failed and take appropriate action), however the monitor itself is monitored via a different mechanism (to avoid a SPOF), and an automated restart of the monitoring process may fail if one of its child processes is holding onto the socket. This can lead to more Bad Things happening downstream. ]
I have went so far as to add code between the fork() and the exec() calls to explicitly close the socket (with associated shutdown) in the child process, and synchronized the fork() and the read() via a pthread_mutex so that I am not reading from the socket when a fork occurs.
The socket is created with
s = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP )
and no other options. Immediately after the creation, I make the call to fcntl to set FD_CLOEXEC. The process is still single-threaded at this point, so there is no race condition (in theory) before the flag is set.
The bind is done next, while still single-threaded. It binds to the first IPV4 address matching "localhost" as returned by getaddrinfo (probably unnecessary, but it's using an underlying utility function to simplify the call to bind.)
The close logic in the child process after the fork (none of which should be necessary because of the FD_CLOEXEC) is:
char retryClose = 1;
int eno = 0;
int retries = 20;
if ( shutdown( s, SHUT_RDWR ) ) {
/* Failed to shutdown. Wait and try again */
my_sleep( 3000 ); /* sleep using select(0,NULL,NULL,NULL, timeval) */
shutdown( socketno, SHUT_RDWR );
/* not much else can be done... */
}
while ( retryClose && ( close( s ) == -1 ) )
{
/* save error number */
eno = errno;
/* check specific error */
switch ( eno ) {
case ( EIO ) :
/* terminate loop if retries have expired; otherwise sleep for a while and try again */
if ( --retries <= 0 ) {
retryClose = 0;
}
else {
my_sleep( 50 );
}
case ( EINTR ) :
break;
case ( EBADF ) :
default:
retryClose = FALSE;
break;
} /* switch ( eno ) */
}
So, I'm setting the FD_CLOEXEC flag, and explicitly closing the fd prior to the exec() call.
Am I missing anything? Is there anything I can do to ensure that the child process really doesn't hang onto the socket?
Turns out, it wasn't the fork/exec that was causing the problem.
The server process could be restarted several times after starting all children processes, without any problems, but occasionally, when the server would die, one of the child processes would actually grab the server socket.
Switching from using connect()/send() in the client to just sendto() seems to have resolved the problem.
Related
Background: My code structure: I have a master socket on main thread, then each time a new client is coming, the threadpool will be notified and let one pre allocated thread take the task.
Inside this thread, I will pass a slave socket to it, and let it using accept call to listen to the client.
Scenario: In my thread pool, thread A is listening to a client right now, now I want to stop all the pre-allocated thread and close all the connection to the client, the main thread is trying to close the connection using close the connection to the client, and trying to terminate thread A using pthread_join.
main() {
// create threadpool
// logic to create mastersocket
startServer(masterSock)
IwantToCloseServer() // this function is not directly called in main, but simulated by a terminal signal , like kill -quit pid.
}
int startServer(int msock) {
int ssock; // slaveSocket
struct sockaddr_in client_addr; // the address of the client...
unsigned int client_addr_len = sizeof(client_addr); // ... and its length
while (!stopCondition) {
// Accept connection:
ssock = ::accept((int)msock, (struct sockaddr*)&client_addr, &client_addr_len); // the return value is a socket
// I was trying to replace this line of code to poll(), but it's not does the same thing as before
if (ssock < 0) {
if (errno == EINTR) continue;
perror("accept");
running =0;
return 0;
// exit(0);
} else {
// push task to thread pool to deal with logic
}
// main thread continues with the loop...
}
return 1;
}
IwantToCloseServer(slaveSocket) {
// when i want to close() or shutdown() function to close connections, these 2 function always return -1, because the thread is blocked on accept call
// logic try to terminate all the preallocated threads, the pthread_join function is stuck because the thread is blocked on accept
}
Problem: The thread A is keeping blocking on the ::accept function , the close and shutdown function return -1, they won’t close the connection , and the pthread_join is not keep going because thread A is blocked on accept.
Things I tried:
I have try to change my while loop related accept function, for example, set a flag stopCondition,
while(!stopConditon) {
ssock = ::accept((int)msock, (struct sockaddr*)&client_addr, &client_addr_len);
}
However, when the main thread change stopCondtion, the thread A is blocked inside the accept function.
It won’t go inside the while loop, so this solution won’t affect the accept function, it’s not working
I have also tried to send a signal to this blocked Thread A, using
pthread_cancel or pthread_kill(Thread A, 9)
However, if I do this, the whole process gets killed.
3.try to use poll() to replace the line, where the accept functions at, with a timeout
however, the program doesn't behave like before, the program can't listen to client anymore.
How do I terminate thread A (which is blocked on accept function call right now), so that I can clean this pre allocated thread and restart my server ?
btw i can not use library like boost in my current program. And this is under linux system not winsocket
to check periodically stopConditon in your while(!stopConditon) { first call accept/pool with a timeout to know if there is something new about msock, then depending on the result call accept etc else do nothing
I was trying to replace this line of code to poll()
try to use poll() to replace the line, where the accept functions at, with a timeout
you cannot replace accept by poll, you have to call accept / pool first and of course check the result then may be call accept
Out of that
while(!stopConditon) {
if(!stopCondtion) {
is redundant and can be replaced by
while(!stopConditon) {
I'm working on an embedded processor running Yocto. I have a modified uio_pdrv_genirq.c UIO driver.
I am writing a library to control the DMA. There is one function which writes to the device file and initiates the DMA. A second function is intended to wait for the DMA to complete by calling select(). Whilst DMA is in progress the device file blocks. On completion the DMA controller issues an interrupt which releases the block on the device file.
I have the system working as expected using read() but I want to switch to select() so that I can include a time out. However, when I use select(), it doesn't seem to be recognising the block and always returns immediately (before the DMA has completed). I have included a simple version of the code:
int gannet_dma_interrupt_wait(dma_device_t *dma_device,
dma_direction dma_transfer_direction) {
fd_set rfds;
struct timeval timeout;
int select_res;
/* Initialize the file descriptor set and add the device file */
FD_ZERO(&rfds);
FD_SET(dma_device->fd, &rfds);
/* Set the timeout period. */
timeout.tv_sec = 5;
timeout.tv_usec = 0;
/* The device file will block until the DMA transfer has completed. */
select_res = select(FD_SETSIZE, &rfds, NULL, NULL, &timeout);
/* Reset the channel */
gannet_dma_reset(dma_device, dma_transfer_direction);
if (select_res == -1) {
/* Select has encountered an error */
perror("ERROR <Interrupt Select Failed>\n");
exit(0);
}
else if (select_res == 1) {
/* The device file descriptor block released */
return 0;
}
else {
/* The device file descriptor block exceeded timeout */
return EINTR;
}
}
Is there anything obviously wrong with my code? Or can anyone suggest an alternative to select?
It turns out that the UIO driver contains two counters. One records the
number of events (event_count), the other records how many events the
calling function is aware of (listener->event_count).
When you do a read() on a UIO driver it returns the number of events and
makes listener->event_count equal to event_count. ie. the listener is
now up to date with all the events that have occurred.
When you use poll() or select() on a UIO driver, it checks if these two
numbers are different and returns if they are (if they are the same it
waits until they differ and then returns). It does NOT update the
listener->event_count.
Clearly if you do not do a read() between calls to select() then
the listener->event_count will not match the event_count and the second
select() will return immediately. Therefore it is necessary to call
read() in between calls to select().
With hindsight it seems clear that select() should work in this way but it wasn't obvious to me at the time.
This answer assumes that it is possible to use select() as intented for the specified device file (I use select() for socket descriptors only). As an alternative function to select(), you may want to check out the poll() family of functions. What follows will hopefully at least offer hints as to what can be done to resolve your problem with calling select().
The first parameter to the select() function has to be the maximum despriptor number plus 1. Since you have only one descriptor, you can pass it directly to select() as its first parameter and add 1. Also consider that the file descriptor in dma_device could be invalid. Returning EINTR on a timeout may actually be what you intend to do but should that not be the case and to test for an invalid descriptor, here is a different version for you to consider. The select() call could be interrupted by a signal, in which case, the return value is -1 and errno will be set to EINTR. This could be handled internally by your function as in:
FD_ZERO(&rfds);
FD_SET(dma_device->fd, &rfds);
timeout.tv_sec = 5;
timeout.tv_usec = 0;
// restart select() if it's interrupted by a signal;
do {
select_res = select(dma_device->fd + 1, &rfds, NULL, NULL, &timeout);
}
while( select_res < 0 && errno == EINTR);
if (select_res > 0) {
// a file descriptor is legible
}
else {
if (select_res == 0) {
// select() timed-out
}
else {
// an error other than a signal occurred
if (errno == EBADF) {
// your file descriptor is invalid
}
}
}
I am doing a simple server/client program in C which listens on a network interface and accepts clients. Each client is handled in a forked process.
The goal I have is to let the parent process know, once a client has disconnected from the child process.
Currently my main loop looks like this:
for (;;) {
/* 1. [network] Wait for new connection... (BLOCKING CALL) */
fd_listen[client] = accept(fd_listen[server], (struct sockaddr *)&cli_addr, &clilen);
if (fd_listen[client] < 0) {
perror("ERROR on accept");
exit(1);
}
/* 2. [process] Call socketpair */
if ( socketpair(AF_LOCAL, SOCK_STREAM, 0, fd_comm) != 0 ) {
perror("ERROR on socketpair");
exit(1);
}
/* 3. [process] Call fork */
pid = fork();
if (pid < 0) {
perror("ERROR on fork");
exit(1);
}
/* 3.1 [process] Inside the Child */
if (pid == 0) {
printf("[child] num of clients: %d\n", num_client+1);
printf("[child] pid: %ld\n", (long) getpid());
close(fd_comm[parent]); // Close the parent socket file descriptor
close(fd_listen[server]); // Close the server socket file descriptor
// Tasks that the child process should be doing for the connected client
child_processing(fd_listen[client]);
exit(0);
}
/* 3.2 [process] Inside the Parent */
else {
num_client++;
close(fd_comm[child]); // Close the child socket file descriptor
close(fd_listen[client]); // Close the client socket file descriptor
printf("[parent] num of clients: %d\n", num_client);
while ( (w = waitpid(-1, &status, WNOHANG)) > 0) {
printf("[EXIT] child %d terminated\n", w);
num_client--;
}
}
}/* end of while */
It all works well, the only problem I have is (probably) due to the blocking accept call.
When I connect to the above server, a new child process is created and child_processing is called.
However when I disconnect with that client, the main parent process does not know about it and does NOT output printf("[EXIT] child %d terminated\n", w);
But, when I connect with a second client after the first client has disconnected, the main loop is able to finally process the while ( (w = waitpid(-1, &status, WNOHANG)) > 0) part and tell me that the first client has disconnected.
If there will be only ever one client connecting and disconnecting afterwards, my main parent process will never be able to tell if it has disconnected or not.
Is there any way to tell the parent process that my client already left?
UPDATE
As I am a real beginner with c, it would be nice if you provide some short snippets to your answer so I can actually understand it :-)
Your waitpid usage is not correct. You have a non-blocking call so if the child is not finished then then the call gets 0:
waitpid(): on success, returns the process ID of the child whose state
has changed; if WNOHANG was specified and one or more child(ren)
specified by pid exist, but have not yet changed state, then 0 is
returned. On error, -1 is returned.
So your are going immediately out of the while loop. Of course this can be catched later when the first children terminates and a second one lets you process the waitpid again.
As you need to have a non-blocking call to wait I can suggest you not to manage termination directly but through SIGCHLD signal that will let you catch termination of any children and then appropriately call waitpid in the handler:
void handler(int signal) {
while (waitpid(...)) { // find an adequate condition and paramters for your needs
}
...
struct sigaction act;
act.sa_flag = 0;
sigemptyset(&(act.sa_mask));
act.sa_handler = handler;
sigaction(SIGCHLD,&act,NULL);
... // now ready to receive SIGCHLD when at least a children changes its state
If I understand correctly, you want to be able to servicve multiple clients at once, and therefore your waitpid call is correct in that it does not block if no child has terminated.
However, the problem you then have is that you need to be able to process asynchronous child termination while waiting for new clients via accept.
Assuming that you're dealing with a POSIXy system, merely having a SIGCHLD handler established and having the signal unmasked (via sigprocmask, though IIRC it is unmasked by default), should be enough to cause accept to fail with EINTR if a child terminates while you are waiting for a new client to connect - and you can then handle EINTR appropriately.
The reason for this is that a SIGCHLD signal will be automatically sent to the parent process when a child process terminates. In general, system calls such as accept will return an error of EINTR ("interrupted") if a signal is received while they are waiting.
However, there would still be a race condition, where a child terminates just before you call accept (i.e. in between where already have waitpid and accept). There are two main possibilities to overcome this:
Do all the child termination processing in your SIGCHLD handler, instead of the main loop. This may not be feasible, however, since there are significant limits to what you are allowed to do within a signal handler. You may not call printf for example (though you may use write).
I do not suggest you go down this path, although it may seem simpler at first it is the least flexible option and may prove unworkable later.
Write to one end of a non-blocking pipe in your SIGCHLD signal handler. Within the main loop, instead of calling accept directly, use poll (or select) to look for readiness on both the socket and the read end of the pipe, and handle each appropriately.
On Linux (and OpenBSD, I'm not sure about others) you can use ppoll (man page) to avoid the need to create a pipe (and in this case you should leave the signal masked, and have it unmasked during the poll operation; if ppoll fails with EINTR, you know that a signal was received, and you should call waitpid). You still need to set a signal handler for SIGCHLD, but it doesn't need to do anything.
Another option on Linux is to use signalfd (man page) to avoid both the need to create a pipe and set up a signal handler (I think). You should mask the SIGCHLD signal (using sigprocmask) if you use this. When poll (or equivalent) indicates that the signalfd is active, read the signal data from it (which clears the signal) and then call waitpid to reap the child.
On various BSD systems you can use kqueue (OpenBSD man page) instead of poll and watch for signals without needing to establish a signal handler.
On other POSIX systems you may be able to use pselect (documentation) in a similar way to ppoll as described above.
There is also the option of using a library such as libevent to abstract away the OS-specifics.
The Glibc manual has an example of using select. Consult the manual pages for poll, ppoll, pselect for more information about those functions. There is an online book on using Libevent.
Rough example for using select, borrowed from Glibc documentation (and modified):
/* Set up a pipe and set signal handler for SIGCHLD */
int pipefd[2]; /* must be a global variable */
pipe(pipefd); /* TODO check for error return */
fcntl(pipefd[1], F_SETFL, O_NONBLOCK); /* set write end non-blocking */
/* signal handler */
void sigchld_handler(int signum)
{
char a = 0; /* write anything, doesn't matter what */
write(pipefd[1], &a, 1);
}
/* set up signal handler */
signal(SIGCHLD, sigchld_handler);
Where you currently have accept, you need to check status of the server socket and the read end of the pipe:
fd_set set, outset;
struct timeval timeout;
/* Initialize the file descriptor set. */
FD_ZERO (&set);
FD_SET (fdlisten[server], &set);
FD_SET (pipefds[0], &set);
FD_ZERO(&outset);
for (;;) {
select (FD_SETSIZE, &set, NULL, &outset, NULL /* no timeout */));
/* TODO check for error return.
EINTR should just continue the loop. */
if (FD_ISSET(fdlisten[server], &outset)) {
/* now do accept() etc */
}
if (FD_ISSET(pipefds[0], &outset)) {
/* now do waitpid(), and read a byte from the pipe */
}
}
Using other mechanisms is generally simpler, so I leave those as an exercise :)
For child processes, the wait() and waitpid() functions can be used to suspends execution of the current process until a child has exited. But this function can not be used for non-child processes.
Is there another function, which can wait for exit of any process ?
Nothing equivalent to wait(). The usual practice is to poll using kill(pid, 0) and looking for return value -1 and errno of ESRCH to indicate that the process is gone.
Update: Since linux kernel 5.3 there is a pidfd_open syscall, which creates an fd for a given pid, which can be polled to get notification when pid has exited.
On BSDs and OS X, you can use kqueue with EVFILT_PROC+NOTE_EXIT to do exactly that. No polling required. Unfortunately there's no Linux equivalent.
So far I've found three ways to do this on Linux:
Polling: you check for the existence of the process every so often, either by using kill or by testing for the existence of /proc/$pid, as in most of the other answers
Use the ptrace system call to attach to the process like a debugger so you get notified when it exits, as in a3nm's answer
Use the netlink interface to listen for PROC_EVENT_EXIT messages - this way the kernel tells your program every time a process exits and you just wait for the right process ID. I've only seen this described in one place on the internet.
Shameless plug: I'm working on a program (open source of course; GPLv2) that does any of the three.
You could also create a socket or a FIFO and read on them. The FIFO is especially simple: Connect the standard output of your child with the FIFO and read. The read will block until the child exits (for any reason) or until it emits some data. So you'll need a little loop to discard the unwanted text data.
If you have access to the source of the child, open the FIFO for writing when it starts and then simply forget about it. The OS will clean the open file descriptor when the child terminates and your waiting "parent" process will wake up.
Now this might be a process which you didn't start or own. In that case, you can replace the binary executable with a script that starts the real binary but also adds monitoring as explained above.
Here is a way to wait for any process (not necessarily a child) in linux to exit (or get killed) without polling:
Using inotify to wait for the /proc'pid' to be deleted would be the perfect solution, but unfortunately inotify does not work with pseudo file systems like /proc.
However we can use it with the executable file of the process.
While the process still exists, this file is being held open.
So we can use inotify with IN_CLOSE_NOWRITE to block until the file is closed.
Of course it can be closed for other reasons (e.g. if another process with the same executable exits) so we have to filter those events by other means.
We can use kill(pid, 0), but that can't guarantee if it is still the same process. If we are really paranoid about this, we can do something else.
Here is a way that should be 100% safe against pid-reuse trouble: we open the pseudo directory /proc/'pid', and keep it open until we are done. If a new process is created in the meantime with the same pid, the directory file descriptor that we hold will still refer to the original one (or become invalid, if the old process cease to exist), but will NEVER refer the new process with the reused pid. Then we can check if the original process still exists by checking, for example, if the file "cmdline" exists in the directory with openat(). When a process exits or is killed, those pseudo files cease to exist too, so openat() will fail.
here is an example code:
// return -1 on error, or 0 if everything went well
int wait_for_pid(int pid)
{
char path[32];
int in_fd = inotify_init();
sprintf(path, "/proc/%i/exe", pid);
if (inotify_add_watch(in_fd, path, IN_CLOSE_NOWRITE) < 0) {
close(in_fd);
return -1;
}
sprintf(path, "/proc/%i", pid);
int dir_fd = open(path, 0);
if (dir_fd < 0) {
close(in_fd);
return -1;
}
int res = 0;
while (1) {
struct inotify_event event;
if (read(in_fd, &event, sizeof(event)) < 0) {
res = -1;
break;
}
int f = openat(dir_fd, "fd", 0);
if (f < 0) break;
close(f);
}
close(dir_fd);
close(in_fd);
return res;
}
You could attach to the process with ptrace(2). From the shell, strace -p PID >/dev/null 2>&1 seems to work. This avoid the busy-waiting, though it will slow down the traced process, and will not work on all processes (only yours, which is a bit better than only child processes).
None I am aware of. Apart from the solution from chaos, you can use semaphores if you can change the program you want to wait for.
The library functions are sem_open(3), sem_init(3), sem_wait(3), ...
sem_wait(3) performs a wait, so you don´t have to do busy waiting as in chaos´ solution. Of course, using semaphores makes your programs more complex and it may not be worth the trouble.
Maybe it could be possible to wait for /proc/[pid] or /proc/[pid]/[something] to disappear?
There are poll() and other file event waiting functions, maybe that could help?
Since linux kernel 5.3 there is a pidfd_open syscall, which creates an fd for a given pid, which can be polled to get notification when pid has exited.
Simply poll values number 22 and 2 of the /proc/[PID]/stat.
The value 2 contains name of the executable and 22 contains start time.
If they change, some other process has taken the same (freed) PID. Thus the method is very reliable.
You can use eBPF to achieve this.
The bcc toolkit implements many excellent monitoring capabilities based on eBPF. Among them, exitsnoop traces process termination, showing the command name and reason for termination,
either an exit or a fatal signal.
It catches processes of all users, processes in containers, as well as processes that
become zombie.
This works by tracing the kernel sched_process_exit() function using dynamic tracing, and
will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
You can refer to this tool for related implementation.
You can get more information about this tool from the link below:
Github repo: tools/exitsnoop: Trace process termination (exit and fatal signals). Examples.
Linux Extended BPF (eBPF) Tracing Tools
ubuntu manpages: exitsnoop-bpfcc
You can first install this tool and use it to see if it meets your needs, and then refer to its implementation for coding, or use some of the libraries it provides to implement your own functions.
exitsnoop examples:
Trace all process termination
# exitsnoop
Trace all process termination, and include timestamps:
# exitsnoop -t
Exclude successful exits, only include non-zero exit codes and fatal signals:
# exitsnoop -x
Trace PID 181 only:
# exitsnoop -p 181
Label each output line with 'EXIT':
# exitsnoop --label EXIT
Another option
Wait for a (non-child) process' exit using Linux's PROC_EVENTS
Reference project:
https://github.com/stormc/waitforpid
mentioned in the project:
Wait for a (non-child) process' exit using Linux's PROC_EVENTS. Thanks
to the CAP_NET_ADMIN POSIX capability permitted to the waitforpid
binary, it does not need to be set suid root. You need a Linux kernel
having CONFIG_PROC_EVENTS enabled.
Appricate #Hongli's answer for macOS with kqueue. I implement it with swift
/// Wait any pids, including non-child pid. Block until all pids exit.
/// - Parameters:
/// - timeout: wait until interval, nil means no timeout
/// - Throws: WaitOtherPidError
/// - Returns: isTimeout
func waitOtherPids(_ pids: [Int32], timeout: TimeInterval? = nil) throws -> Bool {
// create a kqueue
let kq = kqueue()
if kq == -1 {
throw WaitOtherPidError.createKqueueFailed(String(cString: strerror(errno)!))
}
// input
// multiple changes is OR relation, kevent will return if any is match
var changes: [Darwin.kevent] = pids.map({ pid in
Darwin.kevent.init(ident: UInt(pid), filter: Int16(EVFILT_PROC), flags: UInt16(EV_ADD | EV_ENABLE), fflags: NOTE_EXIT, data: 0, udata: nil)
})
let timeoutDeadline = timeout.map({ Date(timeIntervalSinceNow: $0)})
let remainTimeout: () ->timespec? = {
if let deadline = timeoutDeadline {
let d = max(deadline.timeIntervalSinceNow, 0)
let fractionalPart = d - TimeInterval(Int(d))
return timespec(tv_sec: Int(d), tv_nsec: Int(fractionalPart * 1000 * 1000 * 1000))
} else {
return nil
}
}
// output
var events = changes.map{ _ in Darwin.kevent.init() }
while !changes.isEmpty {
// watch changes
// sync method
let numOfEvent: Int32
if var timeout = remainTimeout() {
numOfEvent = kevent(kq, changes, Int32(changes.count), &events, Int32(events.count), &timeout);
} else {
numOfEvent = kevent(kq, changes, Int32(changes.count), &events, Int32(events.count), nil);
}
if numOfEvent < 0 {
throw WaitOtherPidError.keventFailed(String(cString: strerror(errno)!))
}
if numOfEvent == 0 {
// timeout. Return directly.
return true
}
// handle the result
let realEvents = events[0..<Int(numOfEvent)]
let handledPids = Set(realEvents.map({ $0.ident }))
changes = changes.filter({ c in
!handledPids.contains(c.ident)
})
for event in realEvents {
if Int32(event.flags) & EV_ERROR > 0 { // #see 'man kevent'
let errorCode = event.data
if errorCode == ESRCH {
// "The specified process to attach to does not exist"
// ingored
} else {
print("[Error] kevent result failed with code \(errorCode), pid \(event.ident)")
}
} else {
// succeeded event, pid exit
}
}
}
return false
}
enum WaitOtherPidError: Error {
case createKqueueFailed(String)
case keventFailed(String)
}
PR_SET_PDEATHSIG can be used to wait for parent process termination
I have a single threaded program. It sends message to four destinations every five seconds. I don't want connect() to be blocked. So I am writing my program like this:
int j, rc, non_blocking=1, sockets[4], max_fd=0;
struct sockaddr server=get_server_addr();
fd_set fdset;
const struct timeval conn_timeout = { 2, 0 }; /* 2 seconds */
for (j=0; j<4; ++j)
{
sockets[j]=socket( AF_INET, SOCK_STREAM, 0 );
ioctl(sockets[j], FIONBIO, (char *)&non_blocking);
connect(sockets[j], &server, sizeof (server));
}
/* prepare fd_set */
FD_ZERO ( &fdset );
for (j=0;j<4;++j)
{
if (sockets[j] != -1 )
{
FD_SET ( sockets[j], &fdset );
if ( sockets[j] > max_fd )
{
max_fd = sockets[j];
}
}
}
rc=select(max_fd + 1, NULL, &fdset, NULL, &conn_timeout );
if(rc > 0)
{
for (j=0;j<4;++j)
{
if(sockets[j]!=-1 && FD_ISSET(sockets[j],&fdset))
{
/* send() */
}
}
}
/* close all valid sockets */
However, it seems select() returns immediately after ONE file descriptor is ready instead of blocking for conn_timeout (2 seconds). So in this case how can I achieve my targets?
The program continues if all sockets are ready.
The program can block there for 2 seconds if any one of sockets are not ready.
Yeah, select was designed on the assumption that you would want to service each socket as soon as it became ready.
If I understand what you're trying to do, then the simplest way to accomplish it will be to remove each socket from the fdset as it becomes ready. If there are any sockets left in the set, use gettimeofday to adjust the timeout downward, and call select again. When the set is empty, all four sockets are usable and you can proceed.
There are three basic approaches:
If you want to stay strictly portable you need to iterate:
calculate end time from current time and timeout of your choice
Cycle:
-- Create fdset with those fds not yet ready
-- calculate max time to wait
-- select()
-- remeber those fds that are now ready
-- break if end time reached or all fds ready
End cycle
Now you have knowledge of the ready fds and the elapsed time
If you want to stay portable, but can use threads:
start n threads
select on one fd per thread
join all threads
If you do not need to be portable: Most OSes have a facility for such a situation, e.g. Windows/.NET has WaitAll (together with async send and an event)
I don't see the connection between your stated targets and your stated problem. You are correct in saying that select() blocks until at least one socket is ready, but according to target #2 above that is exactly what you want. There's nothing in your stated targets about blocking until all four sockets are ready at the same time.
You should also note that sockets are almost always ready for writing, unless the send buffer is full, which means the receiver's receive buffer is full, which means the receiver is slower than the sender. So using select() alone as the underlying write timer isn't a good idea.