I tried implementing programs that calculates sort of integral. And in order to speed up the computation, one creates multiple processes and other uses multiple threads. In my program, each process adds a double value into shared memory and each thread adds a double value through the pointer.
Here's my question. The add operation obviously loads the value from memory, add a value to that, and stores the result to the memory. So it seems my code is quite prone to producer-consumer problem as many processes/threads access the same memory area. However, I couldn't find the case where somebody used semaphores or mutexes to implement a simple accumulator neither.
// creating processes
while (whatever)
{
pid = fork();
if (pid == 0)
{
res = integralproc(clist, m, tmpcnt, tmpleft, tmpright);
*(createshm(shm_key)) += res;
exit(1);
}
}
// creating or retrieving shared memory
long double* createshm(int key)
{
int shm_id = -1;
void* shm_ptr = (void*)-1;
while (shm_id == -1)
{
shm_id = shmget((key_t)key, sizeof(long double), IPC_CREAT | 0777);
}
while (shm_ptr == (void*)-1)
{
shm_ptr = shmat(shm_id, (void*)0, 0);
}
return (long double*)shm_ptr;
}
// creating threads
while (whatever)
{
threadres = pthread_create(&(targs[i]->thread_handle), NULL, integral_thread, (void*)targs[i]);
}
// thread function. targ->resptr is pointer that we add the result to.
void *integral_thread(void *arg)
{
threadarg *targ = (threadarg*)arg;
long double res = integralproc(targ->clist, targ->m, targ->n, targ->left, targ->right);
*(targ->resptr) += res;
//printf("thread %ld calculated %Lf\n", targ->i, res);
pthread_exit(NULL);
}
So I implemented it this way, and so far no matter how many processes/threads I make, the result was as if it never happened.
I'm concerned that my codes may still be potentially dangerous, just barely out of my sight.
Is this code truly safe from any of these problems? Or am I overlooking at something and should the code be revised?
If your threads are all racing to update the same object (ie, the targ->resptr for each thread points at the same thing), then yes - you do have a data race and you can see incorrect results (likely, "lost updates" where two threads that happen to finish at the same time try to update the sum, and only one of them is effective).
You probably haven't seen this because the execution time of your integralproc() function is long, so the chances of multiple threads simultaneously getting to the point of updating *targ->resptr is low.
Nonetheless, you should still fix the problem. You can either add a mutex lock/unlock around the sum update:
pthread_mutex_lock(&result_lock);
*(targ->resptr) += res;
pthread_mutex_unlock(&result_lock);
(This shouldn't affect the efficiency of the solution, since you are only locking and unlocking once in the lifetime of each thread).
Alternatively, you can have each thread record its own partial result in its own thread argument structure:
targ->result = res;
Then, once the worker threads have all been pthread_join()ed the parent thread that created them can just go through all the thread argument structures and add up the partial results.
No extra locking is needed here because the worker threads don't access each others result variable, and the pthread_join() provides the necessary synchronisation between the worker setting the result and the parent thread reading it.
Related
Consider the following section of a C function:
for (int i = 0; i < n; ++i) {
thread_arg *arg = (thread_arg *) malloc(sizeof(thread_arg));
arg->random_value = random_value;
arg->message = &(message[i * 10]);
if (pthread_create(NULL, NULL, thread_start, (void *) &arg)) {
perror("pthread_create");
exit(EXIT_FAILURE);
}
}
In this for loop, I create n threads which all perform a common routine with different parameters. This for loop is part of a bigger function which returns a data structure which gets modified by all threads in parallel. Thus, it is important that this bigger function won't return before all threads are done.
I was hoping to find a simpler way then giving an individual ID to all these threads and joining afterwards with pthread_join.Is there any general approach to say to a function something like "hey, don't return until all threads you've created returned"?
There are at least two other ways:
Use pthread barriers. The name barrier is used in a completely different sense than you usually hear it when talking about concurrency. Here, it's a synchronization primitive that lets each of a set of threads (waiters on it) block until all of them have reached it, then unblocks them all together. You'd first initialize the barrier in some shared location with n+1 as the count, then have both the function itself and all the n threads it created call pthread_barrier_wait before finishing. Assuming you do it this way, after returning from the wait, the n threads can no longer access the shared state; they need to immediately return.
Create the same thing (or a simplified version of it) with a condvar and mutex. Have a count, protected by a mutex, of how many of the n threads are still working. The function that created them can then do:
pthread_mutex_lock(&cnt_mtx);
while (count > 0) pthread_cond_wait(&cnt_cv, &cnt_mtx);
pthread_mutex_unlock(&cnt_mtx);
Generally, though, I'd use pthread_join here. That's what it's for.
I'm creating a multi-thread program in C and I've some troubles.
There you have the function which create the threads :
void create_thread(t_game_data *game_data)
{
size_t i;
t_args *args = malloc(sizeof(t_args));
i = 0;
args->game = game_data;
while (i < 10)
{
args->initialized = 0;
args->id = i;
printf("%zu CREATION\n", i);//TODO: Debug
pthread_create(&game_data->object[i]->thread_id, NULL, &do_action, args);
i++;
while (args->initialized == 0)
continue;
}
}
Here you have my args struct :
typedef struct s_args
{
t_game_data *object;
size_t id;
int initialized;
}args;
And finally, the function which handle the created threads
void *do_action(void *v_args)
{
t_args *args;
t_game_data *game;
size_t id;
args = v_args;
game = args->game;
id = args->id;
args->initialized = 1;
[...]
return (NULL);
}
The problem is :
The main thread will create new thread faster than the new thread can init his variables :
args = v_args;
game = args->game;
id = args->id;
So, sometime, 2 different threads will get same id from args->id.
To solve that, I use an variable initialized as a bool so make "sleep" the main thread during the new thread's initialization.
But I think that is really sinful.
Maybe there is a way to do that with a mutex? But I heard it wasn't "legal" to unlock a mutex which does not belong his thread.
Thanks for your answers!
The easiest solution to this problem would be to pass a different t_args object to each new thread. To do that, move the allocation inside the loop, and make each thread responsible for freeing its own argument struct:
void create_thread(t_game_data *game_data) {
for (size_t i = 0; i < 10; i++) {
t_args *args = malloc(sizeof(t_args));
if (!args) {
/* ... handle allocation error ... */
} else {
args->game = game_data;
args->id = i;
printf("%zu CREATION\n", i);//TODO: Debug
if (pthread_create(&game_data->object[i]->thread_id, NULL,
&do_action, args) != 0) {
// thread creation failed
free(args);
// ...
}
}
}
}
// ...
void *do_action(void *v_args) {
t_args *args = v_args;
t_game_data *game = args->game;
size_t id = args->id;
free(v_args);
args = v_args = NULL;
// ...
return (NULL);
}
But you also write:
To solve that, I use an variable initialized as a bool so make "sleep"
the main thread during the new thread's initialization.
But I think that is really sinful. Maybe there is a way to do that
with a mutex? But I heard it wasn't "legal" to unlock a mutex which
does not belong his thread.
If you nevertheless wanted one thread to wait for another thread to modify some data, as your original strategy requires, then you must employ either atomic data or some kind of synchronization object. Your code otherwise contains a data race, and therefore has undefined behavior. In practice, you cannot assume in your original code that the main thread will ever see the new thread's write to args->initialized. "Sinful" is an unusual way to describe that, but maybe appropriate if you belong to the Church of the Holy C.
You could solve that problem with a mutex by protecting just the test of args->initialized in your loop -- not the whole loop -- with a mutex, and protecting the threads' write to that object with the same mutex, but that's nasty and ugly. It would be far better to wait for the new thread to increment a semaphore (not a busy wait, and the initialized variable is replaced by the semaphore), or to set up and wait on a condition variable (again not a busy wait, but the initialized variable or an equivalent is still needed).
The problem is that in create_thread you are passing the same t_args structure to each thread. In reality, you probably want to create your own t_args structure for each thread.
What's happening is your 1st thread is starting up with the args passed to it. Before that thread can run do_action the loop is modifying the args structure. Since thread2 and thread1 will both be pointing to the same args structure, when they run do_action they will have the same id.
Oh, and don't forget to not leak your memory
Your solution should work in theory except for a couple of major problems.
The main thread will sit spinning in the while loop that checks the flag using CPU cycles (this is the least bad problem and can be OK if you know it won't have to wait long)
Compiler optimisers can get trigger happy with respect to empty loops. They are also often unaware that a variable may get modified by other threads and can make bad decisions on that basis.
On multi core systems, the main thread may never see the change to args->initiialzed or at least not until much later if the change is in the cache of another core that hasn't been flushed back to main memory yet.
You can use John Bollinger's solution that mallocs a new set of args for each thread and it is fine. The only down side is a malloc/free pair for each thread creation. The alternative is to use "proper" synchronisation functions like Santosh suggests. I would probably consider this except I would use a semaphore as being a bit simpler than a condition variable.
A semaphore is an atomic counter with two operations: wait and signal. The wait operation decrements the semaphore if its value is greater than zero, otherwise it puts the thread into a wait state. The signal operation increments the semaphore, unless there are threads waiting on it. If there are, it wakes one of the threads up.
The solution is therefore to create a semaphore with an initial value of 0, start the thread and wait on the semaphore. The thread then signals the semaphore when it is finished with the initialisation.
#include <semaphore.h>
// other stuff
sem_t semaphore;
void create_thread(t_game_data *game_data)
{
size_t i;
t_args args;
i = 0;
if (sem_init(&semaphore, 0, 0) == -1) // third arg is initial value
{
// error
}
args.game = game_data;
while (i < 10)
{
args.id = i;
printf("%zu CREATION\n", i);//TODO: Debug
pthread_create(&game_data->object[i]->thread_id, NULL, &do_action, args);
sem_wait(&semaphore);
i++;
}
sem_destroy(&semaphore);
}
void *do_action(void *v_args) {
t_args *args = v_args;
t_game_data *game = args->game;
size_t id = args->id;
sem_post(&semaphore);
// Rest of the thread work
return NULL;
}
Because of the synchronisation, I can reuse the args struct safely, in fact, I don't even need to malloc it - it's small so I declare it local to the function.
Having said all that, I still think John Bollinger's solution is better for this use-case but it's useful to be aware of semaphores generally.
You should consider using condition variable for this. You can find an example here http://maxim.int.ru/bookshelf/PthreadsProgram/htm/r_28.html.
Basically wait in the main thread and signal in your other threads.
I'd like to create multi-threads program in C (Linux) with:
Infinite loop with infinite number of tasks
One thread per one task
Limit the total number of threads, so if for instance total threads number is more then MAX_THREADS_NUMBER, do sleep(), until total threads number become less then MAX_THREADS_NUMBER, continue after.
Resume: I need to do infinite number of tasks(one task per one thread) and I'd like to know how to implement it using pthreads in C.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THREADS 50
pthread_t thread[MAX_THREADS];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
pthread_mutex_lock(&lock);
counter += 1;
printf("Job %d started\n", counter);
pthread_mutex_unlock(&lock);
return NULL;
}
int main(void)
{
int i = 0;
int ret;
if (pthread_mutex_init(&lock, NULL) != 0)
{
printf("\n mutex init failed\n");
return 1;
}
for (i = 0; i < MAX_THREADS; i++) {
ret = pthread_create(&(thread[i]), NULL, &doSomeThing, NULL);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
// Wait all threads to finish
for (i = 0; i < MAX_THREADS; i++) {
pthread_join(thread[i], NULL);
}
pthread_mutex_destroy(&lock);
return 0;
}
How to make this loop infinite?
for (i = 0; i < MAX_THREADS; i++) {
ret = pthread_create(&(thread[i]), NULL, &doSomeThing, NULL);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
I need something like this:
while (1) {
if (thread_number > MAX_THREADS_NUMBER)
sleep(1);
ret = pthread_create(...);
if (ret != 0)
printf("\ncan't create thread :[%s]", strerror(ret));
}
Your current program is based on a simple dispatch design: the initial thread creates worker threads, assigning each one a task to perform. Your question is, how you make this work for any number of tasks, any number of worker threads. The answer is, you don't: your chosen design makes such a modification basically impossible.
Even if I were to answer your stated questions, it would not make the program behave the way you'd like. It might work after a fashion, but it'd be like a bicycle with square wheels: not very practical, nor robust -- not even fun after you stop laughing at how silly it looks.
The solution, as I wrote in a comment to the original question, is to change the underlying design: from a simple dispatch to a thread pool approach.
Implementing a thread pool requires two things: First, is to change your viewpoint from starting a thread and having it perform a task, to each thread in the "pool" grabbing a task to perform, and returning to the "pool" after they have performed it. Understanding this is the hard part. The second part, implementing a way for each thread to grab a new task, is simple: this typically centers around a data structure, protected with locks of some sort. The exact data structure does depend on what the actual work to do is, however.
Let's assume you wanted to parallelize the calculation of the Mandelbrot set (or rather, the escape time, or the number of iterations needed before a point can be ruled to be outside the set; the Wikipedia page contains pseudocode for exactly this). This is one of the "embarrassingly parallel" problems; those where the sub-problems (here, each point) can be solved without any dependencies.
Here's how I'd do the core of the thread pool in this case. First, the escape time or iteration count needs to be recorded for each point. Let's say we use an unsigned int for this. We also need the number of points (it is a 2D array), a way to calculate the complex number that corresponds to each point, plus some way to know which points have either been computed, or are being computed. Plus mutually exclusive locking, so that only one thread will modify the data structure at once. So:
typedef struct {
int x_size, y_size;
size_t stride;
double r_0, i_0;
double r_dx, i_dx;
double r_dy, i_dy;
unsigned int *iterations;
sem_t done;
pthread_mutex_t mutex;
int x, y;
} fractal_work;
When an instance of fractal_work is constructed, x_size and y_size are the number of columns and rows in the iterations map. The number of iterations (or escape time) for point x,y is stored in iterations[x+y*stride]. The real part of the complex coordinate for that point is r_0 + x*r_dx + y*r_dy, and imaginary part is i_0 + x*i_dx + y*i_dy (which allows you to scale and rotate the fractal freely).
When a thread grabs the next available point, it first locks the mutex, and copies the x and y values (for itself to work on). Then, it increases x. If x >= x_size, it resets x to zero, and increases y. Finally, it unlocks the mutex, and calculates the escape time for that point.
However, if x == 0 && y >= y_size, the thread posts on the done semaphore and exits, letting the initial thread know that the fractal is complete. (The initial thread just needs to call sem_wait() once for each thread it created.)
The thread worker function is then something like the following:
void *fractal_worker(void *data)
{
fractal_work *const work = (fractal_work *)data;
int x, y;
while (1) {
pthread_mutex_lock(&(work->mutex));
/* No more work to do? */
if (work->x == 0 && work->y >= work->y_size) {
sem_post(&(work->done));
pthread_mutex_unlock(&(work->mutex));
return NULL;
}
/* Grab this task (point), advance to next. */
x = work->x;
y = work->y;
if (++(work->x) >= work->x_size) {
work->x = 0;
++(work->y);
}
pthread_mutex_unlock(&(work->mutex));
/* z.r = work->r_0 + (double)x * work->r_dx + (double)y * work->r_dy;
z.i = work->i_0 + (double)x * work->i_dx + (double)y * work->i_dy;
TODO: implement the fractal iteration,
and count the iterations (say, n)
save the escape time (number of iterations)
in the work->iterations array; e.g.
work->iterations[(size_t)x + work->stride*(size_t)y] = n;
*/
}
}
The program first creates the fractal_work data structure for the worker threads to work on, initializes it, then creates some number of threads giving each thread the address of that fractal_work structure. It can then call fractal_worker() itself too, to "join the thread pool". (This pool automatically "drains", i.e. threads will return/exit, when all points in the fractal are done.)
Finally, the main thread calls sem_wait() on the done semaphore, as many times as it created worker threads, to ensure all the work is done.
The exact fields in the fractal_work structure above do not matter. However, it is at the very core of the thread pool. Typically, there is at least one mutex or rwlock protecting the work details, so that each worker thread gets unique work details, as well as some kind of flag or condition variable or semaphore to let the original thread know that the task is now complete.
In a multithreaded server, there is usually only one instance of the structure (or variables) describing the work queue. It may even contain things like minimum and maximum number of threads, allowing the worker threads to control their own number to dynamically respond to the amount of work available. This sounds magical, but is actually simple to implement: when a thread has completed its work, or is woken up in the pool with no work, and is holding the mutex, it first examines how many queued jobs there are, and what the current number of worker threads is. If there are more than the minimum number of threads, and no work to do, the thread reduces the number of threads, and exits. If there are less than the maximum number of threads, and there is a lot of work to do, the thread first creates a new thread, then grabs the next task to work on. (Yes, any thread can create new threads into the process. They are all on equal footing, too.)
A lot of the code in a practical multithreaded application using one or more thread pools to do work, is some sort of bookkeeping. Thread pool approaches very much concentrates on the data, and the computation needed to be performed on the data. I'm sure there are much better examples of thread pools out there somewhere; the hard part is to think of a good task for the application to perform, as the data structures are so task-dependent, and many computations are so simple that parallelizing them makes no sense (since creating new threads does have a small computational cost, it'd be silly to waste time creating threads when a single thread does the same work in the same or less time).
Many tasks that benefit from parallelization, on the other hand, require information to be shared between workers, and that requires a lot of thinking to implement correctly. (For example, although solutions exist for parallelizing molecular dynamics simulations efficiently, most simulators still calculate and exchange data in separate steps, rather than at the same time. It's just that hard to do right, you see.)
All this means that you cannot expect to be able to write the code, unless you understand the concept. Indeed, truly understanding the concepts are the hard part: writing the code is comparatively easy.
Even in the above example, there are certain tripping points: Does the order of posting the semaphore and releasing the mutex matter? (Well, it depends on what the thread that is waiting for the fractal to complete does -- and indeed, if it is waiting yet.) If it was a condition variable instead of a semaphore, it would be essential that the thread that is interested in the fractal completion is waiting on the condition variable, otherwise it would miss the signal/broadcast. (This is also why I used a semaphore.)
I have a question about synchronizing 4 processes in a UNIX environment. It is very important that no process runs their main functionality without first waiting for the others to "be on the same page", so to speak.
Specifically, they should all not go into their loops without first synchronizing with each other. How do I synchronize 4 processes in a 4 way situation, so that none of them get into their first while loop without first waiting for the others? Note that this is mainly a logic problem, not a coding problem.
To keep things consistent between environments let's just say we have a pseudocode semaphore library with the operations semaphore_create(int systemID), semaphore_open(int semaID), semaphore_wait(int semaID), and semaphore_signal(int semaID).
Here is my attempt and subsequent thoughts:
Process1.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_create(123456); //123456 is an arbitrary ID for the semaphore.
int sem2 = semaphore_create(78901); //78901 is an arbitrary ID for the semaphore.
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process, etc (not really relevant)...
}
}
Process2.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process etc...
}
}
Process3.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process etc...
}
}
Process4.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem2);
semaphore_signal(sem2);
semaphore_signal(sem2);
semaphore_wait(sem1);
semaphore_wait(sem1);
semaphore_wait(sem1);
while(true) {
//...do main functionality of process etc...
}
}
We run Process1 first, and it creates all of the semaphores into system memory used in the other processes (the other processes simply call semaphore_open to gain access to those semaphores). Then, all 4 processes have a signal operation, and then a wait. The signal operation causes process1, process2, and process3 to increment the value of sem1 by 1, so it's resultant maximum value is 3 (depending on what order the operating system decides to run these processes in). Process1, 2, and 3, are all waiting then on sem2, and process4 is waiting on sem1 as well. Process 4 then signals sem2 3 times to bring its value back up to 0, and waits on sem1 3 times. Since sem1 was a maximum of 3 from the signalling in the other processes (depending on what order they ran in, again), then it will bring its value back up to 0, and continue running. Thus, all processes will be synchronized.
So yea, not super confident on my answer. I feel that it depends heavily on what order the processes ran in, which is the whole point of synchronization -- that it shouldn't matter what order they run in, they all synchronize correctly. Also, I am doing a lot of work in Process4. Maybe it would be better to solve this using more than 2 semaphores? Wouldn't this also allow for more flexibility within the loops in each process, if I want to do further synchronization?
My question: Please explain why the above logic will or will not work, and/or a solution on how to solve this problem of 4 way synchronization. I'd imagine this is a very common thing to have to think about depending on the industry (eg. banking and synching up bank accounts). I know it is not very difficult, but I have never worked with semaphores before, so I'm kind of confused on how they work.
The precise semantics of your model semaphore library are not clear enough to answer your question definitively. However, if the difference between semaphore_create() and semaphore_open() is that the latter requires the specified semaphore to already exist, whereas the former requires it to not exist, then yes, the whole thing will fall down if process1 does not manage to create the needed semaphores before any of the other processes attempt to open them. (Probably it falls down in different ways if other semantics hold.)
That sort of issue can be avoided in a threading scenario because with threads there is necessarily an initial single-threaded segment wherein the synchronization structures can be initialized. There is also shared memory by which the various threads can communicate with one another. The answer #Dark referred to depends on those characteristics.
The essential problem with a barrier for multiple independent processes -- or for threads that cannot communicate via shared memory and that are not initially synchronized -- is that you cannot know which process needs to erect the barrier. It follows that each one needs to be prepared to do so. That can work in your model library if semaphore_create() can indicate to the caller which result was achieved, one of
semaphore successfully created
semaphore already exists
(or error)
In that case, all participating processes (whose number you must know) can execute the same procedure, maybe something like this:
void process_barrier(int process_count) {
sem_t *sem1, *sem2, *sem3;
int result = semaphore_create(123456, &sem1);
int counter;
switch (result) {
case SEM_SUCCESS:
/* I am the controlling process */
/* Finish setting up the barrier */
semaphore_create(78901, &sem2);
semaphore_create(23432, &sem3);
/* let (n - 1) other processes enter the barrier... */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_signal(sem1);
}
/* ... and wait for those (n - 1) processes to do so */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_wait(sem2);
}
/* let all the (n - 1) waiting processes loose */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_signal(sem3);
}
/* and I get to continue, too */
break;
case SEM_EXISTS_ERROR:
/* I am NOT the controlling process */
semaphore_open(123456, &sem1);
/* wait, if necessary, for the barrier to be initialized */
semaphore_wait(sem1);
semaphore_open(78901, &sem2);
semaphore_open(23432, &sem3);
/* signal the controlling process that I have reached the barrier */
semaphore_signal(sem2);
/* wait for the controlling process to allow me to continue */
semaphore_wait(sem3);
break;
}
}
Obviously, I have taken some minor liberties with your library interface, and I have omitted error checks except where they bear directly on the barrier's operation.
The three semaphores involved in that example serve distinct, well-defined purposes. sem1 guards the initialization of the synchronization constructs and allows the processes to choose which among them takes responsibility for controlling the barrier. sem2 serves to count how many processes have reached the barrier. sem3 blocks the non-controlling processes that have reached the barrier until the controlling process releases them all.
Pretty much as the title says. I have a snippet of code that looks like this:
pid_t = p;
p = fork();
if (p == 0) {
childfn();
} else if (p > 0) {
parentfn();
} else {
// error
}
I want to ensure that either the parent or the child executes (but not returns from) their respective functions before the other.
Something like a call to sleep() would probably work, but is not guaranteed by any standard, and would just be exploiting an implementation detail of the OS's scheduler...is this possible? Would vfork work?
edit: Both of the functions find their way down to a system() call, one of which will not return until the other is started. So to re-iterate: I need to ensure that either the parent or the child only calls their respective functions (but not returns, cause they won't, which is what all of the mutex based solutions below offer) before the other. Any ideas? Sorry for the lack of clarity.
edit2: Having one process call sched_yield and sleep, I seem to be getting pretty reliable results. vfork does provide the semantics I am looking for, but comes with too many restrictions on what I can do in the child process (I can pretty much only call exec). So, I have found some work-arounds that are good enough, but no real solution. vfork is probably the closest thing to what I was looking for, but all the solutions presented below would work more or less.
This problem would normally be solved by a mutex or a semaphore. For example:
// Get a page of shared memory
int pagesize = getpagesize();
void *mem = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if(!mem)
{
perror("mmap");
return 1;
}
// Put the semaphore at the start of the shared page. The rest of the page
// is unused.
sem_t *sem = mem;
sem_init(sem, 1, 1);
pid_t p = fork();
if (p == 0) {
sem_wait(sem);
childfn();
sem_post(sem);
} else if (p > 0) {
sem_wait(sem);
parentfn();
sem_post(sem);
int status;
wait(&status);
sem_destroy(sem);
} else {
// error
}
// Clean up
munmap(mem, pagesize);
You could also use a mutex in a shared memory region, but you need to make sure to create with non-default attributes with the process-shared attribute said to shared (via pthread_mutexattr_setpshared(&mutex, PTHREAD_PROCESS_SHARED)) in order for it to work.
This ensures that only one of childfn or parentfn will execute at any given time, but they could run in either order. If you need to have a particular one run first, start the semaphore off with a count of 1 instead of 0, and have the function that needs to run first not wait for the semaphore (but still post to it when finished). You might also be able to use a condition variable, which has different semantics.
A mutex should be able to solve this problem. Lock the mutex before the call to fork and have the 1st function excute as normal, while the second tries to claim the mutex. The 1st should unlock the mutex when it is done and the second will wait until it is free.
EDIT: Mutex must be in a shared memory segment for the two processes
Safest way is to use a (named) pipe or socket. One side writes to it, the other reads. The reader cannot read what has not been written yet.
Use a semphore to ensure that one starts before the other.
You could use an atomic variable. Set it to zero before you fork/thread/exec, have the first process set it to one just before (or better, after) it enters the function, and have the second wait while(flag == 0).