This isn't a technical question, but a conceptual one. My program needs to handle several tasks in background. In my case, I consider threads more appropriate than processes for several reasons :
Background tasks aren't heavy, but they have to be processed regularly.
All threads need to manipulate a shared resource. Complete processes would require setting up a shared memory segment, which isn't appropriate in my case (the resource doesn't have a fixed size). Of course, this resource is protected by a mutex.
Another thing I take into consideration is that the main() function needs to be able to end all backgrounds tasks when it wants to (which means joining threads).
Now, here are two implementations :
1 thread, looping inside.
void *my_thread_func(void* shared_ressource)
{
while(1){
do_the_job();
sleep(5);
}
}
// main()
pthread_create(&my_thread, NULL, my_thread_func, (void*)&shared_ressource);
pthread_kill(my_thread, 15);
// pthread_cancel(my_thread);
pthread_join(my_thread, NULL);
Note : In this case, main() needs to signal (or cancel) the thread before joining, otherwise it'll hang. This can be dangerous if the thread doesn't get time to sem_post before it gets terminated.
n threads, looping outside.
void *my_thread_func(void* shared_ressource)
{
do_the_job();
}
// main()
while(1){
pthread_create(&my_thread, NULL, my_thread_func, (void*)&shared_ressource);
pthread_join(my_thread, NULL);
sleep(5);
}
Note : In this case, main() wouldn't naturally hang on pthread_join, it would just have to kill its own continuous loop (using a "boolean" for instance).
Now, I would like some help comparing those two. Threads are lightweight structures, but is the spawning process too heavy for the second implementation ? Or is the infinite loop holding the thread when it shouldn't ? At the moment, I prefer the second implementation because it protects the semaphore : threads do not terminate before they sem_post it. My concern here is optimisation, not functionality.
Having your background threads continuously spawning and dying tends to be inefficient. It is usually much better to have some number of threads stay alive, servicing the background work as it becomes available.
However, it's often better to avoid thread cancellation, too. Instead, I advise using a condition variable and exit flag:
void *my_thread_func(void *shared_resource)
{
struct timespec timeout;
pthread_mutex_lock(&exit_mutex);
do
{
pthread_mutex_unlock(&exit_mutex);
do_the_job();
clock_gettime(CLOCK_REALTIME, &timeout);
timeout.tv_sec += 5;
pthread_mutex_lock(&exit_mutex);
if (!exit_flag)
pthread_cond_timedwait(&exit_cond, &exit_mutex, &timeout);
} while (!exit_flag)
pthread_mutex_unlock(&exit_mutex);
}
When the main thread wants the background thread to exit, it sets the exit flag and signals the condition variable:
pthread_mutex_lock(&exit_mutex);
exit_flag = 1;
pthread_cond_signal(&exit_cond);
pthread_mutex_unlock(&exit_mutex);
pthread_join(my_thread, NULL);
(You should actually strongly consider using CLOCK_MONOTONIC instead of the default CLOCK_REALTIME, because the former isn't affected by changes to the system clock. This requires using pthread_condattr_setclock() and pthread_cond_init() to set the clock used by the condition variable.)
Related
I have a program in which multiple threads are in a loop where they acquire a binary semaphore and then increase a global counter. However, by printing out the thread IDs, I notice that only one thread ever acquires the semaphore. Here's my MRE:
#include <stdbool.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <semaphore.h>
#define NUM_THREADS 10
#define MAX_COUNTER 100
struct threadCtx {
sem_t sem;
unsigned int counter;
};
static void *
threadFunc(void *args)
{
struct threadCtx *ctx = args;
pthread_t self;
bool done = false;
self = pthread_self();
while (!done) {
sem_wait(&ctx->sem);
if ( ctx->counter == MAX_COUNTER ) {
done = true;
}
else {
sleep(1);
printf("Thread %u increasing the counter to %u\n", (unsigned int)self, ++ctx->counter);
}
sem_post(&ctx->sem);
}
return NULL;
}
int main() {
pthread_t threads[NUM_THREADS];
struct threadCtx ctx = {.counter = 0};
sem_init(&sem.ctx, 0, 1);
for (int k=0; k<NUM_THREADS; k++) {
pthread_create(threads+k, NULL, threadFunc, &ctx);
}
for (int k=0; k<NUM_THREADS; k++) {
pthread_join(threads[k], NULL);
}
sem_destroy(&ctx.sem);
return 0;
}
The output is
Thread 1004766976 increasing the counter to 1
Thread 1004766976 increasing the counter to 2
Thread 1004766976 increasing the counter to 3
...
If I remove the call to sleep, the behavior is closer to what I would expect (i.e., the threads being woken up in a seemingly indeterminate manner). Why would this be?
David Schwartz's answer explains what is happening at a low level. That is to say, he's looking at it from the perspective of an OS developer or a hardware designer. Nothing wrong with that, but let's look at your program from the perspective of a Software Architect:
You've got multiple threads all executing the same loop. The loop locks the mutex,* it does some "work," and then it releases the mutex. OK, but what does it do next? Almost the very next thing that your loop does after releasing the mutex is it locks the mutex again. Your loop spends practically 100% of its time doing "work" with the mutex locked.
So, what's the point of running that same loop in multiple threads when there's never any opportunity for two or more threads to work at the same time?
If you want to use threads to do a parallel computation, you need to find/invent safe ways for the threads to do most of their work with the mutex unlocked. They should only lock a mutex for just long enough to post a result or, to take another assignment.
Sometimes that means writing code that is less efficient than single threaded code would be. But suppose that program (A) has a single thread that makes almost 100% use of a CPU, while program (B) uses eight CPUs but only uses them with 50% efficiency. Which program is going to win?
* I know, your example uses a sem_t (semaphore) object. But "semaphore" is what you are using. "Mutex" is the role in which you are using it.
Why would this be?
Context switches are expensive and your implementation is, wisely, minimizing them. Your threads are all fighting over the same resource, trying to schedule them closely will make performance much worse, probably for the entire system.
Since the thread that keeps getting the semaphore never uses up its timeslice, it will keep getting the resource. It is your responsibility to write code to do the work that you want done. It's the implementation's responsibility to execute your code as efficiently as it can, and that's what it's doing.
Most likely, what's going under the hood is this:
The thread that keeps getting the sempahore can always make forward progress except when it is sleeping. But when it is sleeping, no other thread that needs the sempahore can make forward progress.
The thread that keeps getting the semaphore never exhausts its timeslice because it sleeps before that happens.
So there is no reason for the implementation to ever block this thread other than when it is sleeping, meaning that no other thread can get the semaphore. If you don't want this thread to keep sleeping with the semaphore and blocking other threads, then write different code.
My problem is that I cannot reuse cancelled pthread. Sample code:
#include <pthread.h>
pthread_t alg;
pthread_t stop_alg;
int thread_available;
void *stopAlgorithm() {
while (1) {
sleep(6);
if (thread_available == 1) {
pthread_cancel(alg);
printf("Now it's dead!\n");
thread_available = 0;
}
}
}
void *algorithm() {
while (1) {
printf("I'm here\n");
}
}
int main() {
thread_available = 0;
pthread_create(&stop_alg, NULL, stopAlgorithm, 0);
while (1) {
sleep(1);
if (thread_available == 0) {
sleep(2);
printf("Starting algorithm\n");
pthread_create(&alg, NULL, algorithm, 0);
thread_available = 1;
}
}
}
This sample should create two threads - one will be created at the program beginning and will try to cancel second as soon it starts, second should be rerunned as soon at it was cancelled and say "I'm here". But when algorithm thread cancelled once it doesn't start once again, it says "Starting algorithm" and does nothing, no "I'm here" messages any more. Could you please tell me the way to start cancelled(immediately stopped) thread once again?
UPD: So, thanks to your help I understood what is the problem. When I rerun algorithm thread it throws error 11:"The system lacked the necessary resources to create another thread, or the system-imposed limit on the total number of threads in a process PTHREAD_THREADS_MAX would be exceeded.". Actually I have 5 threads, but only one is cancelled, others stop by pthread_exit. So after algorithm stopped and program went to standby mode I checked status of all threads with pthread_join - all thread show 0(cancelled shows PTHREAD_CANCELED), as far as I can understand this means, that all threads stopped successfully. But one more try to run algorithm throws error 11 again. So I've checked memory usage. In standby mode before algorithm - 10428, during the algorithm, when all threads used - 2026m, in standby mode after algorithm stopped - 2019m. So even if threads stopped they still use memory, pthread_detach didn't help with this. Are there any other ways to clean-up after threads?
Also, sometimes on pthread_cancel my program crashes with "libgcc_s.so.1 must be installed for pthread_cancel to work"
Several points:
First, this is not safe:
int thread_available;
void *stopAlgorithm() {
while (1) {
sleep(6);
if (thread_available == 1) {
pthread_cancel(alg);
printf("Now it's dead!\n");
thread_available = 0;
}
}
}
It's not safe for at least reasons. Firstly, you've not marked thread_available as volatile. This means that the compiler can optimise stopAlgorithm to read the variable once, and never reread it. Secondly, you haven't ensured access to it is atomic, or protected it by a mutex. Either declare it:
volatile sig_atomic_t thread_available;
(or similar), or better, protect it by a mutex.
But for the general case of triggering one thread from another, you are better using a condition variable (and a mutex), using pthread_condwait or pthread_condtimedwait in the listening thread, and pthread_condbroadcast in the triggering thread.
Next, what's the point of the stopAlgorithm thread? All it does is cancel the algorithm thread after an unpredictable amount of time between 0 and 6 seconds? Why not just sent the pthread_cancel from the main thread?
Next, do you care where your algorithm is when it is cancelled? If not, just pthread_cancel it. If so (and anyway, I think it's far nicer), regularly check a flag (either atomic and volatile as above, or protected by a mutex) and pthread_exit if it's set. If your algorithm does big chunks every second or so, then check it then. If it does lots of tiny things, check it (say) every 1,000 operations so taking the mutex doesn't introduce a performance penalty.
Lastly, if you cancel a thread (or if it pthread_exits), the way you start it again is simply to call pthread_create again. It's then a new thread running the same code.
I have a question about synchronizing 4 processes in a UNIX environment. It is very important that no process runs their main functionality without first waiting for the others to "be on the same page", so to speak.
Specifically, they should all not go into their loops without first synchronizing with each other. How do I synchronize 4 processes in a 4 way situation, so that none of them get into their first while loop without first waiting for the others? Note that this is mainly a logic problem, not a coding problem.
To keep things consistent between environments let's just say we have a pseudocode semaphore library with the operations semaphore_create(int systemID), semaphore_open(int semaID), semaphore_wait(int semaID), and semaphore_signal(int semaID).
Here is my attempt and subsequent thoughts:
Process1.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_create(123456); //123456 is an arbitrary ID for the semaphore.
int sem2 = semaphore_create(78901); //78901 is an arbitrary ID for the semaphore.
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process, etc (not really relevant)...
}
}
Process2.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process etc...
}
}
Process3.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem1);
semaphore_wait(sem2);
while(true) {
//...do main functionality of process etc...
}
}
Process4.c:
int main() {
//Synchronization area (relevant stuff):
int sem1 = semaphore_open(123456);
int sem2 = semaphore_open(78901);
semaphore_signal(sem2);
semaphore_signal(sem2);
semaphore_signal(sem2);
semaphore_wait(sem1);
semaphore_wait(sem1);
semaphore_wait(sem1);
while(true) {
//...do main functionality of process etc...
}
}
We run Process1 first, and it creates all of the semaphores into system memory used in the other processes (the other processes simply call semaphore_open to gain access to those semaphores). Then, all 4 processes have a signal operation, and then a wait. The signal operation causes process1, process2, and process3 to increment the value of sem1 by 1, so it's resultant maximum value is 3 (depending on what order the operating system decides to run these processes in). Process1, 2, and 3, are all waiting then on sem2, and process4 is waiting on sem1 as well. Process 4 then signals sem2 3 times to bring its value back up to 0, and waits on sem1 3 times. Since sem1 was a maximum of 3 from the signalling in the other processes (depending on what order they ran in, again), then it will bring its value back up to 0, and continue running. Thus, all processes will be synchronized.
So yea, not super confident on my answer. I feel that it depends heavily on what order the processes ran in, which is the whole point of synchronization -- that it shouldn't matter what order they run in, they all synchronize correctly. Also, I am doing a lot of work in Process4. Maybe it would be better to solve this using more than 2 semaphores? Wouldn't this also allow for more flexibility within the loops in each process, if I want to do further synchronization?
My question: Please explain why the above logic will or will not work, and/or a solution on how to solve this problem of 4 way synchronization. I'd imagine this is a very common thing to have to think about depending on the industry (eg. banking and synching up bank accounts). I know it is not very difficult, but I have never worked with semaphores before, so I'm kind of confused on how they work.
The precise semantics of your model semaphore library are not clear enough to answer your question definitively. However, if the difference between semaphore_create() and semaphore_open() is that the latter requires the specified semaphore to already exist, whereas the former requires it to not exist, then yes, the whole thing will fall down if process1 does not manage to create the needed semaphores before any of the other processes attempt to open them. (Probably it falls down in different ways if other semantics hold.)
That sort of issue can be avoided in a threading scenario because with threads there is necessarily an initial single-threaded segment wherein the synchronization structures can be initialized. There is also shared memory by which the various threads can communicate with one another. The answer #Dark referred to depends on those characteristics.
The essential problem with a barrier for multiple independent processes -- or for threads that cannot communicate via shared memory and that are not initially synchronized -- is that you cannot know which process needs to erect the barrier. It follows that each one needs to be prepared to do so. That can work in your model library if semaphore_create() can indicate to the caller which result was achieved, one of
semaphore successfully created
semaphore already exists
(or error)
In that case, all participating processes (whose number you must know) can execute the same procedure, maybe something like this:
void process_barrier(int process_count) {
sem_t *sem1, *sem2, *sem3;
int result = semaphore_create(123456, &sem1);
int counter;
switch (result) {
case SEM_SUCCESS:
/* I am the controlling process */
/* Finish setting up the barrier */
semaphore_create(78901, &sem2);
semaphore_create(23432, &sem3);
/* let (n - 1) other processes enter the barrier... */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_signal(sem1);
}
/* ... and wait for those (n - 1) processes to do so */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_wait(sem2);
}
/* let all the (n - 1) waiting processes loose */
for (counter = 1; counter < process_count; counter += 1) {
semaphore_signal(sem3);
}
/* and I get to continue, too */
break;
case SEM_EXISTS_ERROR:
/* I am NOT the controlling process */
semaphore_open(123456, &sem1);
/* wait, if necessary, for the barrier to be initialized */
semaphore_wait(sem1);
semaphore_open(78901, &sem2);
semaphore_open(23432, &sem3);
/* signal the controlling process that I have reached the barrier */
semaphore_signal(sem2);
/* wait for the controlling process to allow me to continue */
semaphore_wait(sem3);
break;
}
}
Obviously, I have taken some minor liberties with your library interface, and I have omitted error checks except where they bear directly on the barrier's operation.
The three semaphores involved in that example serve distinct, well-defined purposes. sem1 guards the initialization of the synchronization constructs and allows the processes to choose which among them takes responsibility for controlling the barrier. sem2 serves to count how many processes have reached the barrier. sem3 blocks the non-controlling processes that have reached the barrier until the controlling process releases them all.
I'm approaching to C Windows programming in particular threads, concurrency and synchronization.
To experiment, I'm writing a C program that accepts N parameters.
Each parameter indicates a path to a file system directory tree and the program has to compare the content of all directories to decide whether all directories have the same content or not.
The main runs a "reading" thread for each parameter while a single "comparison" thread compares the name of all the entries found. For each file/directory found, "reading" threads synchronize themselves by activating the "comparison" thread.
I wrote the program with Semaphore objects and now I'm trying with Event objects.
The idea is to use N Events auto-reset and a single Event manual-reset.
The N events are used by the N "reading" threads to signal the "comparison" thread which is in WaitForMultipleObjects for an INFINITE time. When all the signals are available, it starts comparing the entry and then it performs a SetEvent() for the manual-reset object.
The "reading" threads wait for this set and then Reset the event and continue working with the next entry.
Some code for the N reading threads:
void ReadingTraverseDirectory(LPTSTR StartPathName, DWORD i) {
//variables and some work
do {
//take the next entry and put it in current_entry;
gtParams[it].entry = current_entry; //global var for comparison
SetEvent(glphReadingEvent[i]); //signal the comparison thread
WaitForSingleObject(ghComparisonEvent, INFINITE); //wait signal to restart working
ResetEvent(ghComparisonEvent); //reset the event
if (current_entry == TYPE_DIR) {
ReadingTraverseDirectory(current_entry, i); //recur to explor the next dir
}
} while (FindNextFile(SearchHandle, &FindData)); //while there are still entries
//
return;
}
Some code for the comparison thread:
DWORD WINAPI CompareThread(LPVOID arg) {
while (entries are equal){
WaitForMultipleObjects(N, glphReadingEvent, TRUE, 1000);
for (r = 0; r < nworkers - 1; r++){
if (_tcscmp(entries) != 0){
//entries are different. exit and close.
}
}
SetEvent(ghComparisonEvent);
}
}
The problem:
Sometimes it happens that one reading thread is able to work without respecting the synchro with other threads. If I put a printf() or Sleep(1) -between Wait and Set of the comparison thread-, the program works perfectly.
My opinion:
I think the manual-reset Event is not safe for this kind of (barrier)synchronization.
A reading thread may be too fast in ResetEvent() and if the scheduler slows down other threads, it is possible that some of them risk to stay blocked while the one which performed the Reset is able to continue its work.However if this is the case, the comparison thread should block itself on WaitingForMultipleObjects causing a deadlock... actually there is no deadlock but 1 thread is able to cycle more times respect to others.
What I'm trying to understand is why a simple Sleep(1) can solve the issue. Is it matter of scheduling or wrong implementation of synchronization?
Thank you.
I'm writing a code in which I have two threads running in parallel.
1st is the main thread which started the 2nd thread.
2nd thread is just a simple thread executing empty while loop.
Now I want to pause / suspend the execution of 2nd thread by 1st thread who created it.
And after some time I want to resume the execution of 2nd thread (by issuing some command or function) from where it was paused / suspended.
This question is not about how to use mutexes, but how to suspend a thread.
In Unix specification there is a thread function called pthread_suspend, and another called pthread_resume_np, but for some reason the people who make Linux, FreeBSD, NetBSD and so on have not implemented these functions.
So to understand it, the functions simply are not there. There are workarounds but unfortunately it is just not the same as calling SuspendThread on windows. You have to do all kinds of non-portable stuff to make a thread stop and start using signals.
Stopping and resuming threads is vital for debuggers and garbage collectors. For example, I have seen a version of Wine which is not able to properly implement the "SuspendThread" function. Thus any windows program using it will not work properly.
I thought that it was possible to do it properly using signals based on the fact that JVM uses this technique of signals for the Garbage collector, but I have also just seen some articles online where people are noticing deadlocks and so on with the JVM, sometimes unreproducable.
So to come around to answer the question, you cannot properly suspend and resume threads with Unix unless you have a nice Unix that implements pthread_suspend_np. Otherwise you are stuck with signals.
The big problem with Signals is when you have about five different libraries all linked in to the same program and all trying to use the same signals at the same time. For this reason I believe that you cannot actually use something like ValGrind and for example, the Boehm GC in one program. At least without major coding at the very lowest levels of userspace.
Another answer to this question could be. Do what Linuz Torvalds does to NVidia, flip the finger at him and get him to implement the two most critical parts missing from Linux. First, pthread_suspend, and second, a dirty bit on memory pages so that proper garbage collectors can be implemented. Start a large petition online and keep flipping that finger. Maybe by the time Windows 20 comes out, they will realise that Suspending and resuming threads, and having dirty bits is actually one of the fundamental reasons Windows and Mac are better than Linux, or any Unix that does not implement pthread_suspend and also a dirty bit on virtual pages, like VirtualAlloc does in Windows.
I do not live in hope. Actually for me I spent a number of years planning my future around building stuff for Linux but have abandoned hope as a reliable thing all seems to hinge on the availability of a dirty bit for virtual memory, and for suspending threads cleanly.
As far as I know you can't really just pause some other thread using pthreads. You have to have something in your 2nd thread that checks for times it should be paused using something like a condition variable. This is the standard way to do this sort of thing.
I tried suspending and resuming thread using signals, here is my solution. Please compile and link with -pthread.
Signal SIGUSR1 suspends the thread by calling pause() and SIGUSR2 resumes the thread.
From the man page of pause:
pause() causes the calling process (or thread) to sleep until a
signal is delivered that either terminates the process or causes the
invocation of a
signal-catching function.
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
// Since I have only 2 threads so using two variables,
// array of bools will be more useful for `n` number of threads.
static int is_th1_ready = 0;
static int is_th2_ready = 0;
static void cb_sig(int signal)
{
switch(signal) {
case SIGUSR1:
pause();
break;
case SIGUSR2:
break;
}
}
static void *thread_job(void *t_id)
{
int i = 0;
struct sigaction act;
pthread_detach(pthread_self());
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
act.sa_handler = cb_sig;
if (sigaction(SIGUSR1, &act, NULL) == -1)
printf("unable to handle siguser1\n");
if (sigaction(SIGUSR2, &act, NULL) == -1)
printf("unable to handle siguser2\n");
if (t_id == (void *)1)
is_th1_ready = 1;
if (t_id == (void *)2)
is_th2_ready = 1;
while (1) {
printf("thread id: %p, counter: %d\n", t_id, i++);
sleep(1);
}
return NULL;
}
int main()
{
int terminate = 0;
int user_input;
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, thread_job, (void *)1);
// Spawned thread2 just to make sure it isn't suspended/paused
// when thread1 received SIGUSR1/SIGUSR2 signal
pthread_create(&thread2, NULL, thread_job, (void *)2);
while (!is_th1_ready && !is_th2_ready);
while (!terminate) {
// to test, I am sensing signals depending on input from STDIN
printf("0: pause thread1, 1: resume thread1, -1: exit\n");
scanf("%d", &user_input);
switch(user_input) {
case -1:
printf("terminating\n");
terminate = 1;
break;
case 0:
printf("raising SIGUSR1 to thread1\n");
pthread_kill(thread1, SIGUSR1);
break;
case 1:
printf("raising SIGUSR2 to thread1\n");
pthread_kill(thread1, SIGUSR2);
break;
}
}
pthread_kill(thread1, SIGKILL);
pthread_kill(thread2, SIGKILL);
return 0;
}
There is no pthread_suspend(), pthread_resume() kind of APIs in POSIX.
Mostly condition variables can be used to control the execution of other threads.
The condition variable mechanism allows threads to suspend execution
and relinquish the processor until some condition is true. A condition
variable must always be associated with a mutex to avoid a race
condition created by one thread preparing to wait and another thread
which may signal the condition before the first thread actually waits
on it resulting in a deadlock.
For more info
Pthreads
Linux Tutorial Posix Threads
If you can use processes instead, you can send job control signals (SIGSTOP / SIGCONT) to the second process. If you still want to share the memory between those processes, you can use SysV shared memory (shmop, shmget, shmctl...).
Even though I haven't tried it myself, it might be possible to use the lower-level clone() syscall to spawn threads that don't share signals. With that, you might be able to send SIGSTOP and SIGCONT to the other thread.
For implementing the pause on a thread, you need to make it wait for some event to happen. Waiting on a spin-lock mutex is CPU cycle wasting. IMHO, this method should not be followed as the CPU cycles could have been used up by other processes/threads.
Wait on a non-blocking descriptor (pipe, socket or some other). Example code for using pipes for inter-thread communication can be seen here
Above solution is useful, if your second thread has more information from multiple sources than just the pause and resume signals. A top-level select/poll/epoll can be used on non-blocking descriptors. You can specify the wait time for select/poll/epoll system calls, and only that much micro-seconds worth of CPU cycles will be wasted.
I mention this solution with forward-thinking that your second thread will have more things or events to handle than just getting paused and resumed. Sorry if it is more detailed than what you asked.
Another simpler approach can be to have a shared boolean variable between these threads.
Main thread is the writer of the variable, 0 - signifies stop. 1 - signifies resume
Second thread only reads the value of the variable. To implement '0' state, use usleep for sime micro-seconds then again check the value. Assuming, few micro-seconds delay is acceptable in your design.
To implement '1' - check the value of the variable after doing certain number of operations.
Otherwise, you can also implement a signal for moving from '1' to '0' state.
You can use mutex to do that, pseudo code would be:
While (true) {
/* pause resume */
lock(my_lock); /* if this is locked by thread1, thread2 will wait until thread1 */
/* unlocks it */
unlock(my_lock); /* unlock so that next iteration thread2 could lock */
/* do actual work here */
}
You can suspend a thread simply by signal
pthread_mutex_t mutex;
static void thread_control_handler(int n, siginfo_t* siginfo, void* sigcontext) {
// wait time out
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
}
// suspend a thread for some time
void thread_suspend(int tid, int time) {
struct sigaction act;
struct sigaction oact;
memset(&act, 0, sizeof(act));
act.sa_sigaction = thread_control_handler;
act.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
sigemptyset(&act.sa_mask);
pthread_mutex_init(&mutex, 0);
if (!sigaction(SIGURG, &act, &oact)) {
pthread_mutex_lock(&mutex);
kill(tid, SIGURG);
sleep(time);
pthread_mutex_unlock(&mutex);
}
}
Not sure if you will like my answer or not. But you can achieve it this way.
If it is a separate process instead of a thread, I have a solution (This might even work for thread, maybe someone can share your thoughts) using signals.
There is no system currently in place to pause or resume the execution of the processes. But surely you can build one.
Steps I would do if I want it in my project:
Register a signal handler for the second process.
Inside the signal handler, wait for a semaphore.
Whenever you want to pause the other process, just send in a signal
that you registered the other process with. The program will go into
sleep state.
When you want to resume the process, you can send a different signal
again. Inside that signal handler, you will check if the semaphore is
locked or not. If it is locked, you will release the semaphore. So
the process 2 will continue its execution.
If you can implement this, please do share your feedack, if it worked for you or not. Thanks.