main process -> pthread -> fork + execvp - c

I am seeing a strange issue.
Sometimes when i run my program long enough i see that there are two copies of my program running. The second is a child process of the first since i see that the parent PID of the second one is that of the first one.
I realized that i have a fork in my code and its only because of this that i can have two copies running -- i can otherwise never have two copies of my program running.
This happens very rarely but it does happen.
The architecture is as follows:
The main program gets an event and spawns a pthread. In that thread i do some processing and based on some result i do a fork immediately followed by an execvp.
I realize that its not best to call a fork from a pthread but in my design the main process gets many events and the only way to parallely work on all those events was to use pthreads. Each pthread does some processing and in certain cases it needs to call a different program (for which i use execvp). Since i had to call a different program i had to use fork
I am wondering if because i am eventually calling a fork from a thread context is it possible that multiple threads parallely call fork + execvp and this "somehow" results in two copies being created.
If this is indeed happening would it help if i protect the code that does fork+execvp with a mutex since that would result in only one thread calling the fork + execvp.
However, if i take a mutex before fork + excvp then i dont know when to release it.
Any help here would be appreciated.
thread code that does fork + execvp -- in case you guys can spot an issue there:
In main.c
status = pthread_create(&worker_thread, tattr,
do_some_useful_work, some_pointer);
[clipped]
void *do_some_useful_work (void * arg)
{
/* Do some processing and fill pArguments array */
child_pid = fork();
if (child_pid == 0)
{
char *temp_log_file;
temp_log_file = (void *) malloc (strlen(FORK_LOG_FILE_LOCATION) +
strlen("/logfile.") + 8);
sprintf (temp_log_file, "%s/logfile.%d%c", FORK_LOG_FILE_LOCATION, getpid(),'\0');
/* Open log file */
int log = creat(temp_log_file, 0777);
/* Redirect stdout to log file */
close(1);
dup(log);
/* Redirect stderr to log file */
close(2);
dup(log);
syslog(LOG_ERR, "Opening up log file %s\n", temp_log_file);
free (temp_log_file);
close (server_sockets_that_parent_is_listening_on);
execvp ("jazzy_program", pArguments);
}
pthread_exit (NULL);
return NULL;
}
I looked through this code and i see no reason why i would do a fork and not do an execvp -- so the only scenario that comes to my mind is that multiple threads get executed and they all call fork + execvp. This sometimes causes two copies of my main program to run.

In the case where execvp fails for any reason (perhaps too many processes, out of memory, etc.), you fail to handle the error; instead the forked copy of the thread keeps running. Calling pthread_exit (or any non-async-signal-safe) function in this process has undefined behavior, so it might not exit properly but hang or do something unexpected. You should always check for exec failure and immediately _exit(1) or similar when this happens. Also, while this probably isn't your problem, it's unsafe to call malloc after forking in a multithreaded process since it's non-async-signal-safe.

Related

is it safe to write to a file in another thread?

I do not know, if this is ok, but it compiles:
typedef struct
{
int fd;
char *str;
int c;
} ARG;
void *ww(void *arg){
ARG *a = (ARG *)arg;
write(a->fd,a->str,a->c);
return NULL;
}
int main (void) {
int fd = open("./smf", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
int ch = fork();
if (ch==0){
ARG *arg; pthread_t p1;
arg->fd = fd;
arg->str = malloc(6);
strcpy(arg->str, "child");
arg->c = 6;
pthread_create( &p1, NULL, ww, arg);
} else {
write(fd, "parent\0", 7);
wait(NULL);
}
return 0;
}
I am wait()int in parent, but I do not know if I should also pthread_join to merge threads or it is implicitly by wait(). However is it even safe to write to the same file in two threads? I run few times and sometimes output was 1) parentchild but sometimes only 2) parent, no other cases - I do not know why child did not write as well when parent wait()s for it. Can someone please explain why these outputs?
You need to call pthread_join() in the child process to avoid potential race conditions during the child process’s exit sequence (for example the child process can otherwise exit before its thread gets a chance to write to the file). Calling pthread_join() in the parent process won’t help,
As for the file, having both processes write to it is safe in the sense that it won’t cause a crash, but the order in which the data is written to the file will be indeterminate since the two processes are executing concurrently.
I do not know, if this is ok, but it compiles:
Without even any warnings? Really? I suppose the code you are compiling must include all the needed headers (else you should have loads of warnings), but if your compiler cannot be persuaded to spot
buggy.c:30:15: warning: ‘arg’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
arg->fd = fd;
^
then it's not worth its salt. Indeed, variable arg is used uninitialized, and your program therefore exhibits undefined behavior.
But even if you fix that, after which the program can be made to compile without warnings, it still is not ok.
I am wait()int in parent, but I do not know if I should also
pthread_join to merge threads or it is implicitly by wait().
The parent process is calling wait(). This waits for a child process to terminate, if there are any. Period. It has no implications for the behavior of the child prior to its termination.
Moreover, in a pthreads program, the main thread is special: when it terminates, the whole program terminates, including all other threads. Your child process therefore suffers from a race condition: the main thread terminates immediately after creating a second thread, without ensuring that the other thread terminates first, so it is undefined what, if any, of the behavior of the second thread is actually performed. To avoid this issue, yes, in the child process, the main thread should join the other one before itself terminating.
However
is it even safe to write to the same file in two threads?
It depends -- both on the circumstances and on what you mean by "safe". POSIX requires the write() function to be thread-safe, but that does not mean that multiple threads or processes writing to the same file cannot still interfere with each other by overwriting each other's output.
Yours is a somewhat special case, however, in that parent and child are writing via the same open file description in the kernel, the child having inherited an association with that from its parent. According to POSIX, then, you should see both processes' output (if any; see above) in the file. POSIX provides no way to predict the order in which those outputs will appear, however.
I run few
times and sometimes output was 1) parentchild but sometimes only 2)
parent, no other cases - I do not know why child did not write as well
when parent wait()s for it. Can someone please explain why these
outputs?
The child process can terminate before its second thread performs its write. In this case you will see only the parent's output, not the child's.

Why does a process create a zombie if execv fails, but not if execv is successful and terminates?

So I am confused by the behavior of my C program. I am using the construct,
int pid = fork();
if (pid == 0) {
if(file_upload_script_path) {
rc = execv(file_upload_script_path, args);
if(rc == -1) {
printf("Error has occured when starting file_upload.exp!\n");
exit(0);
}
} else {
printf("Error with memory allocation!\n");
}
}
else {
printf("pid=%d\n", pid);
}
To fork the process and run a script for doing file upload. The script will by itself terminate safely, either by finishing the upload or failing.
Now, there was a problem with the script path, causing execv to fail. Here I noted the child process will terminate successfully if execv finishes, but in case it fails (r==-1) and I exit the process, it will become a zombie. Anyone knows why this happens?
Note here, I know why the child-process becomes a zombie. What I am confused about is why the process not becomes a zombie if execv works.
EDIT:
I got a question about errno and the cause of the error. The cause of the error is known. There were a problem with the build process, so the path of the script were another than expected.
However, this may happen again and I want to make sure my program does not start spawning zombies when it does. The behavoir where zombies are created in some situations and not others are very confusing.
BR
Patrik
If you don't want to create zombies, your program has to reap zombie processes no matter if they call execv or not call it or no matter if the execv call succeeds. To reap zombie processes "automagically" handle SIGCHLD signal:
void handle_sigchld(int sig) {
int saved_errno = errno;
while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {}
errno = saved_errno;
}
int main() {
signal(SIGCHLD, handle_sigchld);
// rest of your program....
}
Inspired (no... ripped off) from: this link.
Or maybe you want only to reap only this specified child, because later you want to call fork() and handle childs return value. Then pass the returned pid from fork() in your parent to the signal handler and wait on this pid in sigchld if needed (with some checking, ex. if the pid already finished then ignore future SIGCHLD etc...).
In this scenario, when the execv fails, the child process is killed. The fun part, I think is what happens when you call exec family of functions.
The exec family of functions replaces the current image of the process with the new image of the binary you are about to exec.
So, whatever code was will not remain - and the error in your script would cause its death.
Here, the parent needs to listen on the death of the child process using wait flavour of functions (read: waitpid).
When you say that there's problem in the script, it means that the execv actually succeeded in creating the new image; but the latter failed of its own accord.
This is what I think is happening...
If the printf of if (rc==-1) is being executed, then perhaps changing exit(0) to _exit(0) should take care of it.

How to wait for 2 types of events in a loop (C)?

I am trying to wait on waitpid() and read() in a while-true loop. Specifically, I am waiting for either one of these two events and then process it in each iteration of the loop. Currently, I have the following implementation (which is not I desired).
while (true) {
pid_t pid = waitpid(...);
process_waitpid_event(...);
ssize_t sz = read(socket, ....);
process_read_event(...);
}
The problem with this implementation is that the processing of the second event depends on the completion of the first event. Instead of processing these two events sequentially, I wish to process whichever event that comes first in each iteration of the loop. How should I do this?
If you don't want to touch threading, you can include this in the options of the call to waitpid:
pid_t pid = waitpid(pid, &status, WNOHANG);
As from the manpage for waitpid:
WNOHANG - return immediately if no child has exited.
As such, if waitpid isn't ready, it won't block and the program will just keep going to the next line.
As for the read, if it is blocking you might want to have a look at poll(2). You can essentially check to see if your socket is ready every set interval, e.g. 250ms, and then call read when it is. This will allow it to not block.
Your code might look a bit like this:
// Creating the struct for file descriptors to be polled.
struct pollfd poll_list[1];
poll_list[0].fd = socket_fd;
poll_list[0].events = POLLIN|POLLPRI;
// POLLIN There is data to be read
// POLLPRI There is urgent data to be read
/* poll_res > 0: Something ready to be read on the target fd/socket.
** poll_res == 0: Nothing ready to be read on the target fd/socket.
** poll_res < 0: An error occurred. */
poll_res = poll(poll_list, 1, POLL_INTERVAL);
This is just assuming that you're reading from a socket, judging from the variable names in your code. As others have said, your problem might require something a bit more heavy duty like threading.
The answer of #DanielPorteous should work too if you don't want to use thread in your program.
The idea is simple, not keeping the waitpid and the read function to wait unless they consumes some time to do their operation. The idea is keeping a timeout mechanism so that, if waitpid has nothing to create an impact to the whole operation, it will return immediately and the same thing goes for the read operation too.
If the read function takes very long time to read the whole buffer, you may restrict the reading manually from the read function so that it doesn't read the whole at once, rather it reads for 2 milliseconds and then pass the cycle to the waitpid function to execute.
But its safe to use threading for your purpose and its pretty easy to implement. Here's a nice guideline about how can you implement threading.
In your case you need to declare two threads.
pthread_t readThread;
pthread_t waitpidThread;
Now you need to create the thread and pass specific function as their parameter.
pthread_create(&(waitpidThread), NULL, &waitpidFunc, NULL);
pthread_create(&(readThread), NULL, &readFunc, NULL);
Now you may have to write your waitpidFunc and readFunc function. They might look like this.
void* waitpidFunc(void *arg)
{
while(true) {
pid_t pid = waitpid(...);
// This is to put an exit condition somewhere.
// So that you can finish the thread
int exit = process_waitpid_event(...);
if(exit == 0) break;
}
return NULL;
}
I think that the right tool in this situation is select or poll. Both are doing essentially the same job. They allow to select those descriptors where an input is available. Hence you can wait simultaneously on two sockets for example. However, it is not directly usable in your case as you want to wait for a process and socket. The solution will be to create a pipe which will receive something when the waitpid finishes.
You can launch a new thread and connect it with the original one with a pipe. The new thread will invoke waitpid and when it finished it will write its result to the pipe. The main thread will wait either for the socket or pipe using select.

No blocking thread

I have read this and this post on stackoverflow, but no one of them give me what I want to do.
In my case, I want to create a Thread, launch it and let it running with no blocking stat as long as the main process runs. This thread has no communication, no synchronization with the main process, it do his job fully independent.
Consider this code:
#define DAY_PERIOD 86400 /* 3600*24 seconds */
int main() {
char wDir[255] = "/path/to/log/files";
compress_logfiles(wDir);
// do other things, this things let the main process runs all the time.
// just segmentation fault, stackoverflow, memory overwrite or
// somethings like that stop it.
return 0;
}
/* Create and launch thread */
void compress_logfiles(char *wDir)
{
pthread_t compressfiles_th;
if (pthread_create(&compressfiles_th, NULL, compress, wDir))
{
fprintf(stderr, "Error create compressfiles thread\n");
return;
}
if (pthread_join(compressfiles_th, NULL))
{
//fprintf(stderr, "Error joining thread\n");
return;
}
return;
}
void *compress(void *wDir)
{
while(1)
{
// Do job to compress files
// and sleep for one day
sleep(DAY_PERIOD); /* sleep one day*/
}
return NULL;
}
With ptheard_join in compress_logfiles function, the thread compresses all files successfully and never returns because it is in infinite while loop, so the main process still blocked all the time. If I remove ptheard_join from compress_logfiles function, the main process is not blocked because it don't wait thread returns, but the thread compresses one file and exit (there a lot of files, arround one haundred).
So, is there a way to let main process launch compressfiles_th thread and let it do his job without waiting it to finish or exit?
I found pthread_tryjoin_np and pthread_timedjoin_np in Linux Programmer's Manual, it seems that pthread_tryjoin_np do the job if I don't care of the returned value, it is good idea to use it?
Thank you.
Edit 1:
Please note that the main process is daemonized after call to compress_logfiles(wDir), perhaps the daemonization kill the main process and re-launch it is the problem?
Edit 2: the solution
Credit to dbush
Yes, fork causes the problem, and pthread_atfork() solves it. I made this change to run the compressfiles_th without blocking main process:
#define DAY_PERIOD 86400 /* 3600*24 seconds */
char wDir[255] = "/path/to/log/files"; // global now
// function added
void child_handler(){
compress_logfiles(wDir); // wDir is global variable now
}
int main()
{
pthread_atfork(NULL, NULL, child_handler);
// Daemonize the process.
becomeDaemon(BD_NO_CHDIR & BD_NO_CLOSE_FILES & BD_NO_REOPEN_STD_FDS & BD_NO_UMASK0 & BD_MAX_CLOSE);
// do other things, this things let the main process runs all the time.
// just segmentation fault, stackoverflow, memory overwrite or
// somethings like that stop it.
return 0;
}
child_handler() function is called after fork. pthread_atfork
When you fork a new process, only the calling thread is duplicated, not all threads.
If you wish to daemonize, you need to fork first, then create your threads.
From the man page for fork:
The child process is created with a single thread--the one that
called fork(). The entire virtual address space of the parent is
replicated in the child, including the states of mutexes, condition
variables, and other pthreads objects; the use of pthread_atfork(3)
may be helpful for dealing with problems that this can cause.

At what point does a fork() child process actually begin?

Does the process begin when fork() is declared? Is anything being killed here?
pid_t child;
child = fork();
kill (child, SIGKILL);
Or do you need to declare actions for the fork process in order for it to actually "begin"?
pid_t child;
child = fork();
if (child == 0) {
// do something
}
kill (child, SIGKILL);
I ask because what I am trying to do is create two children, wait for the first to complete, and then kill the second before exiting:
pid_t child1;
pid_t child2;
child1 = fork();
child2 = fork();
int status;
if (child1 == 0) { //is this line necessary?
}
waitpid(child1, &status, 0);
kill(child2, SIGKILL);
The C function fork is defined in the standard C library (glibc on linux). When you call it, it performs an equivalent system call (on linux its name is clone) by the means of a special CPU instruction (on x86 sysenter). This causes the CPU to switch to a privileged mode and start executing instructions of the kernel. The kernel then creates a new process (a record in a list and accompanying structures), which inherits a copy of memory mappings of the original process (text, heap, stack, and others), file descriptors and more.
The memory areas are marked as non-writable, so that when the new or the original process tries to overwrite them, the kernel gets to handle a CPU exception and perform a copy-on-write (therefore delaying the need to copy a memory page until absolutely necessary). That's because the mappings initially point to the same pages (pieces of physical memory) in both processes.
The kernel then gives execution to the scheduler, which decides which process to run next. It could be the original process, the child process, or any other process running in the system.
Note: The Linux kernel actually puts the child process in front of the parent process in the run queue, so it is run earlier than the parent. This is deemed to give better performance when the child calls exec right after forking.
When execution is given to the original process, the CPU is switched back to nonprivileged mode and starts executing the next instruction. In this case it continues with the fork function of the standard library, which returns the PID of the child process (as returned by the clone system call).
Similarly, the child process continues execution in the fork function, but here it returns 0 to the calling function.
After that, the program continues in both cases normally. The child process has the original process as the parent (this is noted in a structure in the kernel). When it exists, the parent process is supposed to do the cleanup (receiving the exit status of the child) by calling wait.
Note: The clone system call is rather complicated, because it unifies fork with the creation of threads, as well as linux namespaces. Other operating systems have different implementation of fork, e.g. FreeBSD has fork system call by itself.
Disclaimer: I am not a kernel developer. If you know better, please correct the answer.
See Also
clone (2)
The Design and Implementation of the FreeBSD Operating System (Google Books)
Understanding the Linux Kernel (Google Books)
Is it true that fork() calls clone() internally?
"Declare" is the wrong word to use in this context; C uses that word to talk about constructs that merely assert the existence of something, e.g.
extern int fork(void);
is a declaration of the function fork. Writing that in your code (or having it written for you as a consequence of #include <unistd.h>) does not cause fork to be called.
Now, the statement in your sample code, child = fork(); when written inside a function body, does (generate code to) make a call to the function fork. That function, assuming it is in fact the system primitive fork(2) on your operating system, and assuming it succeeds, has the special behavior of returning twice, once in the original process and once in a new process, with different return values in each so you can tell which is which.
So the answer to your question is that in both of the code fragments you showed, assuming the things I mentioned in the previous paragraph, all of the code after the child = fork(); line is at least potentially executed twice, once by the child and once by the parent. The if (child == 0) { ... } construct (again, this is not a "declaration") is the standard idiom for making parent and child do different things.
EDIT: In your third code sample, yes, the child1 == 0 block is necessary, but not to ensure that the child is created. Rather, it is there to ensure that whatever you want child1 to do is done only in child1. Moreover, as written (and, again, assuming all calls succeed) you are creating three child processes, because the second fork call will be executed by both parent and child! You probably want something like this instead:
pid_t child1, child2;
int status;
child1 = fork();
if (child1 == -1) {
perror("fork");
exit(1);
}
else if (child1 == 0) {
execlp("program_to_run_in_child_1", (char *)0);
/* if we get here, exec failed */
_exit(127);
}
child2 = fork();
if (child2 == -1) {
perror("fork");
kill(child1, SIGTERM);
exit(1);
}
else if (child2 == 0) {
execlp("program_to_run_in_child_2", (char *)0);
/* if we get here, exec failed */
_exit(127);
}
/* control reaches this point only in the parent and only when
both fork calls succeeded */
if (waitpid(child1, &status, 0) != child1) {
perror("waitpid");
kill(child1, SIGTERM);
}
/* only use SIGKILL as a last resort */
kill(child2, SIGTERM);
FYI, this is only a skeleton. If I were writing code to do this for real (which I have: see for instance https://github.com/zackw/tbbscraper/blob/master/scripts/isolate.c ) there would be a whole bunch more code just to comprehensively detect and report errors, plus the additional logic required to deal with file descriptor management in the children and a few other wrinkles.
The fork process spawns a new process identical to the old one and returns in both functions.
This happens automatically so you don't have to take any actions.
But nevertheless, it is cleaner to check if the call indeed succeeded:
A value below 0 indicates failure. In this case, it is not good to call kill().
A value == 0 indicates that we are the child process. In this case, it is not very clean to call kill().
A value > 0 indicates that we are the parent process. In this case, the return value is our child. Here it is safe to call kill().
In your case, you even end up with 4 processes:
Your parent calls fork(), being left with 2 processes.
Both of them call fork() again, resulting in a new child process for each of them.
You should move the 2nd fork() process into the branch where the parent code runs.
The child process begins some time after fork() has been called (there is some setup which happens in the context of the child).
You can be sure that the child is running when fork() returns.
So the code
pid_t child = fork();
kill (child, SIGKILL);
will kill the child. The child might execute kill(0, SIGKILL) which does nothing and returns an error.
There is no way to tell whether the child might ever live long enough to execute it's kill. Most likely, it won't since the Linux kernel will set up the process structure for the child and let the parent continue. The child will just be waiting in the ready list of the processes. The kill will then remove it again.
EDIT If fork() returns a value <= 0, then you shouldn't wait or kill.

Resources