Implementing join function in a user level thread library - c

I am trying to implement a user level thread library as part of a project. My focus is on the join function. Lets say that Thread1 calls join function on Thread2. What i need to do is get the return value/argument supplied to pthread_exit() from Thread2 and store it in the memory location specified in join function argument.
But, how do i get the return value of another thread?
Any help will be appreciated. Thanks

Here is an example taken from the GnuPth(pth_lib.c) user-level threading library, which shows the implementation of the exit and join functions, respectively. I simplified the code to highlight the return value handling.
void pth_exit(void *value)
{
pth_debug2("pth_exit: marking thread \"%s\" as dead", pth_current->name);
/* the main thread is special, because its termination
would terminate the whole process, so we have to delay
its termination until it is really the last thread */
/* execute cleanups */
/*
* Now mark the current thread as dead, explicitly switch into the
* scheduler and let it reap the current thread structure; we can't
* free it here, or we'd be running on a stack which malloc() regards
* as free memory, which would be a somewhat perilous situation.
*/
pth_current->join_arg = value;
pth_current->state = PTH_STATE_DEAD;
pth_debug2("pth_exit: switching from thread \"%s\" to scheduler", pth_current->name);
//return (for ever) to the scheduler
}
and the corresponding pth_join:
/* waits for the termination of the specified thread */
int pth_join(pth_t tid, void **value)
{
//Validate thread situation.
//wait until thread death
//save returned value for the caller
if (value != NULL)
*value = tid->join_arg;
//remove thread from the thread queue
//free its memory space
return TRUE;
}

You could create storage for the return value in the application or process context, initialized when your threads library initializes. Your pthread_exit would fill the value and your join would use the threadid to retrieve it - I think this will only work for non-dynamically allocated return values though.

I'm interesting about the meta-info for individual thread in you implementation. Return value could be one of items stored in that of a thread, besides ID, stack pointer and so on.
When pthread_exit() is called, such information should be dumped to one or more data structures by scheduler, so that other thread can consult them when needed.

Related

How to solve pthread_create error (11) resource temporarily unavailable?

I'm building a project in c language (using openwrt as OS) to upload files to FTP server. and i'm using MQTT for the incoming data. So for each topic i subscribe to, i save this data and then upload it to the FTP server and to keep things going smoothly, each time i need to upload a file i just use a thread to do the job.
and to make sure the program doesn't run too much threads, each topic is allowed to create one thread. and i'm using a variable (like mutex but it's not pthread_mutex_t because i don't need to block a thread, i want to skip that step and upload the next file). i though with this technique i'm safe but after running the program for 15 min i get this error 11 which said resource temporarily unavailable when the program try to create a thread (pthread_create).
one of my tries to figure out what could be the problem is.
i used the pthread_join() function which is not an option in my condition but just to make sure that each thread is finished and not running in permanent loop. the program was running over than an hour and the error didn't show up again. and of course each thread was finished as intended.
and i'm 90% sure that each topic is creating only on thread and the next one will be create only if the previous one finished. (i was tracking the status of the variable before and after the thread creation).
i set the max thread from here "/proc/sys/kernel/threads-max" to 2000 (2000 is more than enough since i don't have too much topics)
upload function (this will create the thread):
void uploadFile(<args...>, bool* locker_p){
*locker_p = true;
args->uploadLocker_p = uploadLocker_p;
<do something here>
pthread_t tid;
int error = pthread_create(&tid, NULL, uploadFileThread, (void*)args);
if(0 != error){
printf("Couldn't run thread,(%d) => %s\n", error, strerror(error));
}
else{
printf("Thread %d\n", tid);
}
}
upload thread:
void *uploadFileThread(void *arg){
typeArgs* args = (typeArgs*)arg;
<do something like upload the file>
*(args->uploadLocker_p) = false;
free(args);
return NULL;
//pthread_exit(0);
}
The default stack size for the created threads is eating too much virtual memory.
Essentially, the kernel is telling your process that it has so much virtual memory already in use that it doesn't dare give it any more, because there isn't enough RAM and swap to back it up if the process were to suddenly use it all.
To fix, create an attribute that limits the per-thread stack to something sensible. If your threads do not use arrays as local variables, or do deep recursion, then 2*PTHREAD_STACK_MIN (from <limits.h>) is a good size.
The attribute is not consumed by the pthread_create() call, it is just a configuration block, and you can use the same one for any number of threads you create, or create a new one for each thread.
Example:
pthread_attr_t attrs;
pthread_t tid;
int err;
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 2 * PTHREAD_STACK_MIN);
err = pthread_create(&tid, &attrs, uploadFileThread, (void *)args);
pthread_attr_destroy(&attrs);
if (err) {
/* Failed, errno in err; use strerror(err) */
} else {
/* Succeeded */
}
Also remember that if your uploadFileThread() allocates memory, it will not be freed automatically when the thread exits. It looks like OP already knows this (as they have the function free the argument structure when it's ready to exit), but I thought it a good idea to point it out.
Personally, I like to use a thread pool instead. The idea is that the upload workers are created beforehand, and they'll wait for a new job. Here is an example:
pthread_mutex_t workers_lock;
pthread_mutex_t workers_wait;
volatile struct work *workers_work;
volatile int workers_idle;
volatile sig_atomic_t workers_exit = 0;
where struct work is a singly-linked list protected by workers_lock, workers_idle is initialized to zero and incremented when waiting for new work, workers_wait is a condition variable signaled when new work arrives under workers_lock, and workers_exit is a counter that when nonzero, tells that many workers to exit.
A worker would be basically something along
void worker_do(struct work *job)
{
/* Whatever handling a struct job needs ... */
}
void *worker_function(void *payload __attribute__((unused)))
{
/* Grab the lock. */
pthread_mutex_lock(&workers_lock);
/* Job loop. */
while (!workers_exit) {
if (workers_work) {
/* Detach first work in chain. */
struct work *job = workers_work;
workers_work = job->next;
job->next = NULL;
/* Work is done without holding the mutex. */
pthread_mutex_unlock(&workers_lock);
worker_do(job);
pthread_mutex_lock(&workers_lock);
continue;
}
/* We're idle, holding the lock. Wait for new work. */
++workers_idle;
pthread_cond_wait(&workers_wait, &workers_lock);
--workers_idle;
}
/* This worker exits. */
--workers_exit;
pthread_mutex_unlock(&workers_lock);
return NULL;
}
The connection handling process can use idle_workers() to check the number of idle workers, and either grow the worker thread pool, or reject the connection as being too busy. The idle_workers() is something like
static inline int idle_workers(void)
{
int result;
pthread_mutex_lock(&workers_lock);
result = workers_idle;
pthread_mutex_unlock(&workers_lock);
return result;
}
Note that each worker only holds the lock for very short durations, so the idle_workers() call won't block for long. (pthread_cond_wait() atomically releases the lock when it starts waiting for a signal, and only returns after it re-acquires the lock.)
When waiting for a new connection in accept(), set the socket nonblocking and use poll() to wait for new connections. If the timeout passes, examine the number of workers, and reduce them if necessary by calling reduce_workers(1) or similar:
void reduce_workers(int number)
{
pthread_mutex_lock(&workers_lock);
if (workers_exit < number) {
workers_exit = number;
pthread_cond_broadcast(&workers_wait);
}
pthread_mutex_unlock(&workers_lock);
}
To avoid having to call pthread_join() for each thread – and we really don't even know which threads have exited here! – to reap/free the kernel and C library metadata related to the thread, the worker threads need to be detached. After creating a worker thread tid successfully, just call pthread_detach(tid);.
When a new connection arrives and it is determined to be one that should be delegated to the worker threads, you can, but do not have to, check the number of idle threads, create new worker threads, reject the upload, or just append the work to the queue, so that it will "eventually" be handled.

Understanding pthreads locks and condition variables

I had an exercise about threads, locks, and condition variables in C. I needed to write a program that get data, turn it into a linked list, starting 3 threads each calculating result for each node in the list and main thread printing the results after evreyone finished.
This is the main function:
int thread_finished_count;
// Lock and Conditional variable
pthread_mutex_t list_lock;
pthread_mutex_t thread_lock;
pthread_cond_t thread_cv;
int main(int argc, char const *argv[])
{
node *list;
int pairs_count, status;
thread_finished_count = 0;
/* get the data and start the threads */
node *head = create_numbers(argc, argv, &pairs_count);
list = head; // backup head for results
pthread_t *threads = start_threads(&list);
/* wait for threads and destroy lock */
status = pthread_cond_wait(&thread_cv, &list_lock);
chcek_status(status);
status = pthread_mutex_destroy(&list_lock);
chcek_status(status);
status = pthread_mutex_destroy(&thread_lock);
chcek_status(status);
/* print result in original list */
print_results(head);
/* cleanup */
wait_for_threads(threads, NUM_THREADS);
free_list(head);
free(threads);
return EXIT_SUCCESS;
}
Please note that the create_numbers function is working properly, and the list is working as intended.
Here is the start_thread and thread_function code:
pthread_t *start_threads(node **list)
{
int status;
pthread_t *threads = (pthread_t *)malloc(sizeof(pthread_t) * NUM_THREADS);
check_malloc(threads);
for (int i = 0; i < NUM_THREADS; i++)
{
status = pthread_create(&threads[i], NULL, thread_function, list);
chcek_status(status);
}
return threads;
}
void *thread_function(node **list)
{
int status, self_id = pthread_self();
printf("im in %u\n", self_id);
node *currentNode;
while (1)
{
if (!(*list))
break;
status = pthread_mutex_lock(&list_lock);
chcek_status(status);
printf("list location %p thread %u\n", *list, self_id);
if (!(*list))
{
status = pthread_mutex_unlock(&list_lock);
chcek_status(status);
break;
}
currentNode = (*list);
(*list) = (*list)->next;
status = pthread_mutex_unlock(&list_lock);
chcek_status(status);
currentNode->gcd = gcd(currentNode->num1, currentNode->num2);
status = usleep(10);
chcek_status(status);
}
status = pthread_mutex_lock(&thread_lock);
chcek_status(status);
thread_finished_count++;
status = pthread_mutex_unlock(&thread_lock);
chcek_status(status);
if (thread_finished_count != 3)
return NULL;
status = pthread_cond_signal(&thread_cv);
chcek_status(status);
return NULL;
}
void chcek_status(int status)
{
if (status != 0)
{
fputs("pthread_function() error\n", stderr);
exit(EXIT_FAILURE);
}
}
Note that self_id is used for debugging purposes.
My problem
My main problem is about splitting the work. Each thread so get an element from the global linked list, calculate gcd, then go on and take the next element. I get this effect only if I'm adding usleep(10) after I unlock the mutex in the while loop. If I don't add the usleep, the FIRST thread will go in and do all the work while other thread just wait and come in after all the work has been done.
Please note!: I thought about the option that maybe the first thread is created, and untill the second thread is created the first one already finish all the jobs. This is why I added the "I'm in #threadID" check with usleep(10) when evrey thread is created. They all come in but only first one is doing all the jobs.
Here is the output example if I do usleep after the mutex unlock (notice diffrent thread ID)
with usleep
./v2 nums.txt
im in 1333593856
list location 0x7fffc4fb56a0 thread 1333593856
im in 1316685568
im in 1325139712
list location 0x7fffc4fb56c0 thread 1333593856
list location 0x7fffc4fb56e0 thread 1316685568
list location 0x7fffc4fb5700 thread 1325139712
list location 0x7fffc4fb5720 thread 1333593856
list location 0x7fffc4fb5740 thread 1316685568
list location 0x7fffc4fb5760 thread 1325139712
list location 0x7fffc4fb5780 thread 1333593856
list location 0x7fffc4fb57a0 thread 1316685568
list location 0x7fffc4fb57c0 thread 1325139712
list location 0x7fffc4fb57e0 thread 1333593856
list location 0x7fffc4fb5800 thread 1316685568
list location (nil) thread 1325139712
list location (nil) thread 1333593856
...
normal result output
...
And thats the output if I comment out the usleep after mutex lock (Notice the same thread ID)
without usleep
./v2 nums.txt
im in 2631730944
list location 0x7fffe5b946a0 thread 2631730944
list location 0x7fffe5b946c0 thread 2631730944
list location 0x7fffe5b946e0 thread 2631730944
list location 0x7fffe5b94700 thread 2631730944
list location 0x7fffe5b94720 thread 2631730944
list location 0x7fffe5b94740 thread 2631730944
list location 0x7fffe5b94760 thread 2631730944
list location 0x7fffe5b94780 thread 2631730944
list location 0x7fffe5b947a0 thread 2631730944
list location 0x7fffe5b947c0 thread 2631730944
list location 0x7fffe5b947e0 thread 2631730944
list location 0x7fffe5b94800 thread 2631730944
im in 2623276800
im in 2614822656
...
normal result output
...
My second question is about order of threads working. My exercise ask me to not use join to synchronize the threads (only use at the end to "free resources") but instand use that condition variable.
My goal is that each thread will take element, do the calculation and in the meanwhile another thread will go in and take another element, and new thread will take each element (or at least close to that)
Thanks for reading and I appriciate your help.
First, you are doing the gcd() work while holding the lock... so (a) only one thread will do any work at any one time, though (b) that does not entirely explain why only one thread appears to do (nearly) all the work -- as KamilCuk says, it may be that there is so little work to do, that it's (nearly) all done before the second thread wakes up properly. [More exotic, there may be some latency between thread 'a' unlocking the mutex and another thread starting to run, such that thread 'a' can acquire the mutex before another thread gets there.]
POSIX says that when a mutex is unlocked, if there are waiters then "the scheduling policy shall determine which thread shall acquire the mutex". The default "scheduling policy" is (to the best of my knowledge) implementation defined.
You could try a couple of things: (1) use a pthread_barrier_t to hold all the threads at the start of thread_function() until they are all running; (2) use sched_yield(void) after pthread_mutex_unlock() to prompt the system into running the newly runnable thread.
Second, you should not under any circumstance treat a 'condition variable' as a signal. For main() to know that all threads have finished you need a count -- which could be a pthread_barrier_t; or it could be simple integer, protected by a mutex, with a 'condition variable' to hold the main thread on while it waits; or it could be a count (in main()) and a semaphore (posted once by each thread as it exits).
Third, you show pthread_cond_wait(&cv, &lock); in main(). At that point main() must own lock... and it matters when that happened. But: as it stands, the first thread to find the list empty will kick cv, and main() will proceed, even though other threads are still running. Though once main() does re-acquire lock, any threads which are then still running will either be exiting or will be stuck on the lock. (It's a mess.)
In general, the template for using a 'condition variable' is:
pthread_mutex_lock(&...lock) ;
while (!(... thing we need ...))
pthread_cond_wait(&...cond_var, &...lock) ;
... do stuff now we have what we need ....
pthread_mutex_unlock(&...lock) ;
NB: a 'condition variable' does not have a value... despite the name, it is not a flag to signal that some condition is true. A 'condition variable' is, essentially, a queue of threads waiting to be re-started. When a 'condition variable' is signaled, at least one waiting thread will be re-started -- but if there are no threads waiting, nothing happens, in particular the (so called) 'condition variable' retains no memory of the signal.
In the new code, following the above template, main() should:
/* wait for threads .... */
status = pthread_mutex_lock(&thread_lock);
chcek_status(status);
while (thread_finished_count != 3)
{
pthread_cond_wait(&thread_cv, &thread_lock) ;
chcek_status(status);
} ;
status = pthread_mutex_unlock(&thread_lock) ;
chcek_status(status);
So what is going on here ?
main() is waiting for thread_finished_count == 3
thread_finished_count is a shared variable "protected" by the thread_lock mutex.
...so it is incremented in thread_function() under the mutex.
...and main() must also read it under the mutex.
if main() finds thread_finished_count != 3 it must wait.
to do that it does: pthread_cond_wait(&thread_cv, &thread_lock), which:
unlocks thread_lock
places the thread on the thread_cv queue of waiting threads.
and it does those atomically.
when thread_function() does the pthread_cond_signal(&thread_cv) it wakes up the waiting thread.
when the main() thread wakes up, it will first re-acquire the thread_lock...
...so it can then proceed to re-read thread_finished_count, to see if it is now 3.
FWIW: I recommend not destroying the mutexes etc until after all the threads have been joined.
I have looked deeper into how glibc (v2.30 on Linux & x86_64, at least) implements pthread_mutex_lock() and _unlock().
It turns out that _lock() works something like this:
if (atomic_cmp_xchg(mutex->lock, 0, 1))
return <OK> ; // mutex->lock was 0, is now 1
while (1)
{
if (atomic_xchg(mutex->lock, 2) == 0)
return <OK> ; // mutex->lock was 0, is now 2
...do FUTEX_WAIT(2)... // suspend thread iff mutex->lock == 2...
} ;
And _unlock() works something like this:
if (atomic_xchg(mutex->lock, 0) == 2) // set mutex->lock == 0
...do FUTEX_WAKE(1)... // if may have waiter(s) start 1
Now:
mutex->lock: 0 => unlocked, 1 => locked-but-no-waiters, 2 => locked-with-waiter(s)
'locked-but-no-waiters' optimizes for the case where there is no lock contention and there is no need to do FUTEX_WAKE in _unlock().
the _lock()/_unlock() functions are in the library -- they are not in the kernel.
...in particular, the ownership of the mutex is a matter for the library, not the kernel.
FUTEX_WAIT(2) is a call to the kernel, which will place the thread on a pending queue associated with the mutex, unless mutex->lock != 2.
The kernel checks for mutex->lock == 2 and adds the thread to the queue atomically. This deals with the case of _unlock() being called after the atomic_xchg(mutex->lock, 2).
FUTEX_WAKE(1) is also a call to the kernel, and the futex man page tells us:
FUTEX_WAKE (since Linux 2.6.0)
This operation wakes at most 'val' of the waiters that are waiting ... No guarantee is provided about which waiters are awoken (e.g., a waiter with a higher scheduling priority is not guaranteed to be awoken in preference to a waiter with a lower priority).
where 'val' in this case is 1.
Although the documentation says "no guarantee about which waiters are awoken", the queue appears to be at least FIFO.
Note especially that:
_unlock() does not pass the mutex to the thread started by the FUTEX_WAKE.
once woken up, the thread will again try to obtain the lock...
...but may be beaten to it by any other running thread -- including the thread which just did the _unlock().
I believe this is why you have not seen the work being shared across the threads. There is so little work for each one to do, that a thread can unlock the mutex, do the work and be back to lock the mutex again before a thread woken up by the unlock can get going and succeed in locking the mutex.

what happends to the thread if we free its pointer

What is the impact of freeing the struct that holds the pthread_t on the thread itself?
I have a struct that represents a thread:
typedef struct car{
int cur_place;
pthread_t car_thread;
}car;
and i have an array that holds these cars, after some time i want to free the struct from inside the thread, i mean:
void * car_thread(void * number){
int num = *(int *)number;
free(maze[num]);
maze[num] = NULL;
pthread_exit(NULL);
}
is it possible? what will happen to the thread after i free the pthread that holds it? will the it run the next lines?
thanks in advance.
Freeing car only releases the memory used to store those values. The thread will still be other there somewhere possibly. Think of pthread_t as simply holding a number or address used by the system to talk about the thread. Not the thread itself.
Just don't refer to the memory of car anywhere after its free'd.
You have just freed the location storing thread's ID, the data structure which stores thread attributes is freed when you do pthread_exit(NULL). Therefore answer to your question: thread stll exists.
The thread will not exit until pthread_exit() is called or, in the case you have signal handling built in, a signal for exiting has been received. The thread is attached to the process but different threads are isolated entities until you tie them up in any kind of thread organization. Apologize for the vague phrasing but this the best way I can describe it.
If you intend to signal the cars to exit when you free the data structure, you need to have signal handling build in to notify each thread to exit. Or, call pthread_exit() in each thread somehow.

How can barriers be destroyable as soon as pthread_barrier_wait returns?

This question is based on:
When is it safe to destroy a pthread barrier?
and the recent glibc bug report:
http://sourceware.org/bugzilla/show_bug.cgi?id=12674
I'm not sure about the semaphores issue reported in glibc, but presumably it's supposed to be valid to destroy a barrier as soon as pthread_barrier_wait returns, as per the above linked question. (Normally, the thread that got PTHREAD_BARRIER_SERIAL_THREAD, or a "special" thread that already considered itself "responsible" for the barrier object, would be the one to destroy it.) The main use case I can think of is when a barrier is used to synchronize a new thread's use of data on the creating thread's stack, preventing the creating thread from returning until the new thread gets to use the data; other barriers probably have a lifetime equal to that of the whole program, or controlled by some other synchronization object.
In any case, how can an implementation ensure that destruction of the barrier (and possibly even unmapping of the memory it resides in) is safe as soon as pthread_barrier_wait returns in any thread? It seems the other threads that have not yet returned would need to examine at least some part of the barrier object to finish their work and return, much like how, in the glibc bug report cited above, sem_post has to examine the waiters count after having adjusted the semaphore value.
I'm going to take another crack at this with an example implementation of pthread_barrier_wait() that uses mutex and condition variable functionality as might be provided by a pthreads implementation. Note that this example doesn't try to deal with performance considerations (specifically, when the waiting threads are unblocked, they are all re-serialized when exiting the wait). I think that using something like Linux Futex objects could help with the performance issues, but Futexes are still pretty much out of my experience.
Also, I doubt that this example handles signals or errors correctly (if at all in the case of signals). But I think proper support for those things can be added as an exercise for the reader.
My main fear is that the example may have a race condition or deadlock (the mutex handling is more complex than I like). Also note that it is an example that hasn't even been compiled. Treat it as pseudo-code. Also keep in mind that my experience is mainly in Windows - I'm tackling this more as an educational opportunity than anything else. So the quality of the pseudo-code may well be pretty low.
However, disclaimers aside, I think it may give an idea of how the problem asked in the question could be handled (ie., how can the pthread_barrier_wait() function allow the pthread_barrier_t object it uses to be destroyed by any of the released threads without danger of using the barrier object by one or more threads on their way out).
Here goes:
/*
* Since this is a part of the implementation of the pthread API, it uses
* reserved names that start with "__" for internal structures and functions
*
* Functions such as __mutex_lock() and __cond_wait() perform the same function
* as the corresponding pthread API.
*/
// struct __barrier_wait data is intended to hold all the data
// that `pthread_barrier_wait()` will need after releasing
// waiting threads. This will allow the function to avoid
// touching the passed in pthread_barrier_t object after
// the wait is satisfied (since any of the released threads
// can destroy it)
struct __barrier_waitdata {
struct __mutex cond_mutex;
struct __cond cond;
unsigned waiter_count;
int wait_complete;
};
struct __barrier {
unsigned count;
struct __mutex waitdata_mutex;
struct __barrier_waitdata* pwaitdata;
};
typedef struct __barrier pthread_barrier_t;
int __barrier_waitdata_init( struct __barrier_waitdata* pwaitdata)
{
waitdata.waiter_count = 0;
waitdata.wait_complete = 0;
rc = __mutex_init( &waitdata.cond_mutex, NULL);
if (!rc) {
return rc;
}
rc = __cond_init( &waitdata.cond, NULL);
if (!rc) {
__mutex_destroy( &pwaitdata->waitdata_mutex);
return rc;
}
return 0;
}
int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned int count)
{
int rc;
rc = __mutex_init( &barrier->waitdata_mutex, NULL);
if (!rc) return rc;
barrier->pwaitdata = NULL;
barrier->count = count;
//TODO: deal with attr
}
int pthread_barrier_wait(pthread_barrier_t *barrier)
{
int rc;
struct __barrier_waitdata* pwaitdata;
unsigned target_count;
// potential waitdata block (only one thread's will actually be used)
struct __barrier_waitdata waitdata;
// nothing to do if we only need to wait for one thread...
if (barrier->count == 1) return PTHREAD_BARRIER_SERIAL_THREAD;
rc = __mutex_lock( &barrier->waitdata_mutex);
if (!rc) return rc;
if (!barrier->pwaitdata) {
// no other thread has claimed the waitdata block yet -
// we'll use this thread's
rc = __barrier_waitdata_init( &waitdata);
if (!rc) {
__mutex_unlock( &barrier->waitdata_mutex);
return rc;
}
barrier->pwaitdata = &waitdata;
}
pwaitdata = barrier->pwaitdata;
target_count = barrier->count;
// all data necessary for handling the return from a wait is pointed to
// by `pwaitdata`, and `pwaitdata` points to a block of data on the stack of
// one of the waiting threads. We have to make sure that the thread that owns
// that block waits until all others have finished with the information
// pointed to by `pwaitdata` before it returns. However, after the 'big' wait
// is completed, the `pthread_barrier_t` object that's passed into this
// function isn't used. The last operation done to `*barrier` is to set
// `barrier->pwaitdata = NULL` to satisfy the requirement that this function
// leaves `*barrier` in a state as if `pthread_barrier_init()` had been called - and
// that operation is done by the thread that signals the wait condition
// completion before the completion is signaled.
// note: we're still holding `barrier->waitdata_mutex`;
rc = __mutex_lock( &pwaitdata->cond_mutex);
pwaitdata->waiter_count += 1;
if (pwaitdata->waiter_count < target_count) {
// need to wait for other threads
__mutex_unlock( &barrier->waitdata_mutex);
do {
// TODO: handle the return code from `__cond_wait()` to break out of this
// if a signal makes that necessary
__cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
} while (!pwaitdata->wait_complete);
}
else {
// this thread satisfies the wait - unblock all the other waiters
pwaitdata->wait_complete = 1;
// 'release' our use of the passed in pthread_barrier_t object
barrier->pwaitdata = NULL;
// unlock the barrier's waitdata_mutex - the barrier is
// ready for use by another set of threads
__mutex_unlock( barrier->waitdata_mutex);
// finally, unblock the waiting threads
__cond_broadcast( &pwaitdata->cond);
}
// at this point, barrier->waitdata_mutex is unlocked, the
// barrier->pwaitdata pointer has been cleared, and no further
// use of `*barrier` is permitted...
// however, each thread still has a valid `pwaitdata` pointer - the
// thread that owns that block needs to wait until all others have
// dropped the pwaitdata->waiter_count
// also, at this point the `pwaitdata->cond_mutex` is locked, so
// we're in a critical section
rc = 0;
pwaitdata->waiter_count--;
if (pwaitdata == &waitdata) {
// this thread owns the waitdata block - it needs to hang around until
// all other threads are done
// as a convenience, this thread will be the one that returns
// PTHREAD_BARRIER_SERIAL_THREAD
rc = PTHREAD_BARRIER_SERIAL_THREAD;
while (pwaitdata->waiter_count!= 0) {
__cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
};
__mutex_unlock( &pwaitdata->cond_mutex);
__cond_destroy( &pwaitdata->cond);
__mutex_destroy( &pwaitdata_cond_mutex);
}
else if (pwaitdata->waiter_count == 0) {
__cond_signal( &pwaitdata->cond);
__mutex_unlock( &pwaitdata->cond_mutex);
}
return rc;
}
17 July 20111: Update in response to a comment/question about process-shared barriers
I forgot completely about the situation with barriers that are shared between processes. And as you mention, the idea I outlined will fail horribly in that case. I don't really have experience with POSIX shared memory use, so any suggestions I make should be tempered with scepticism.
To summarize (for my benefit, if no one else's):
When any of the threads gets control after pthread_barrier_wait() returns, the barrier object needs to be in the 'init' state (however, the most recent pthread_barrier_init() on that object set it). Also implied by the API is that once any of the threads return, one or more of the the following things could occur:
another call to pthread_barrier_wait() to start a new round of synchronization of threads
pthread_barrier_destroy() on the barrier object
the memory allocated for the barrier object could be freed or unshared if it's in a shared memory region.
These things mean that before the pthread_barrier_wait() call allows any thread to return, it pretty much needs to ensure that all waiting threads are no longer using the barrier object in the context of that call. My first answer addressed this by creating a 'local' set of synchronization objects (a mutex and an associated condition variable) outside of the barrier object that would block all the threads. These local synchronization objects were allocated on the stack of the thread that happened to call pthread_barrier_wait() first.
I think that something similar would need to be done for barriers that are process-shared. However, in that case simply allocating those sync objects on a thread's stack isn't adequate (since the other processes would have no access). For a process-shared barrier, those objects would have to be allocated in process-shared memory. I think the technique I listed above could be applied similarly:
the waitdata_mutex that controls the 'allocation' of the local sync variables (the waitdata block) would be in process-shared memory already by virtue of it being in the barrier struct. Of course, when the barrier is set to THEAD_PROCESS_SHARED, that attribute would also need to be applied to the waitdata_mutex
when __barrier_waitdata_init() is called to initialize the local mutex & condition variable, it would have to allocate those objects in shared memory instead of simply using the stack-based waitdata variable.
when the 'cleanup' thread destroys the mutex and the condition variable in the waitdata block, it would also need to clean up the process-shared memory allocation for the block.
in the case where shared memory is used, there needs to be some mechanism to ensured that the shared memory object is opened at least once in each process, and closed the correct number of times in each process (but not closed entirely before every thread in the process is finished using it). I haven't thought through exactly how that would be done...
I think these changes would allow the scheme to operate with process-shared barriers. the last bullet point above is a key item to figure out. Another is how to construct a name for the shared memory object that will hold the 'local' process-shared waitdata. There are certain attributes you'd want for that name:
you'd want the storage for the name to reside in the struct pthread_barrier_t structure so all process have access to it; that means a known limit to the length of the name
you'd want the name to be unique to each 'instance' of a set of calls to pthread_barrier_wait() because it might be possible for a second round of waiting to start before all threads have gotten all the way out of the first round waiting (so the process-shared memory block set up for the waitdata might not have been freed yet). So the name probably has to be based on things like process id, thread id, address of the barrier object, and an atomic counter.
I don't know whether or not there are security implications to having the name be 'guessable'. if so, some randomization needs to be added - no idea how much. Maybe you'd also need to hash the data mentioned above along with the random bits. Like I said, I really have no idea if this is important or not.
As far as I can see there is no need for pthread_barrier_destroy to be an immediate operation. You could have it wait until all threads that are still in their wakeup phase are woken up.
E.g you could have an atomic counter awakening that initially set to the number of threads that are woken up. Then it would be decremented as last action before pthread_barrier_wait returns. pthread_barrier_destroy then just could be spinning until that counter falls to 0.

how to run thread in main function infinitely without causing program to terminate

I have a function say void *WorkerThread ( void *ptr).
The function *WorkerThread( void *ptr) has infinite loop which reads and writes continously from Serial Port
example
void *WorkerThread( void *ptr)
{
while(1)
{
// READS AND WRITE from Serial Port USING MUXTEX_LOCK AND MUTEX_UNLOCK
} //while ends
}
The other function I worte is ThreadTest
example
int ThreadTest()
{
pthread_t Worker;
int iret1;
pthread_mutex_init(&stop_mutex, NULL);
if( iret1 = pthread_create(&Worker, NULL, WorkerThread, NULL) == 0)
{
pthread_mutex_lock(&stop_mutex);
stopThread = true;
pthread_mutex_unlock(&stop_mutex);
}
if (stopThread != false)
stopThread = false;
pthread_mutex_destroy(&stop_mutex);
return 0;
}
In main function
I have something like
int main(int argc, char **argv)
{
fd = OpenSerialPort();
if( ConfigurePort(fd) < 0) return 0;
while (true)
{
ThreadTest();
}
return 0;
}
Now, when I run this sort of code with debug statement it runs fine for few hours and then throw message like "can't able to create thread" and application terminates.
Does anyone have an idea where I am making mistakes.
Also if there is way to run ThreadTest in main with using while(true) as I am already using while(1) in ThreadWorker to read and write infinitely.
All comments and criticism are welcome.
Thanks & regards,
SamPrat.
You are creating threads continually and might be hitting the limit on number of threads.
Pthread_create man page says:
EAGAIN Insufficient resources to create another thread, or a system-imposed
limit on the number of threads was encountered. The latter case may
occur in two ways: the RLIMIT_NPROC soft resource limit (set via
setrlimit(2)), which limits the number of process for a real user ID,
was reached; or the kernel's system-wide limit on the number of
threads, /proc/sys/kernel/threads-max, was reached.
You should rethink of the design of your application. Creating an infinite number of threads is not a god design.
[UPDATE]
you are using lock to set an integer variable:
pthread_mutex_lock(&stop_mutex);
stopThread = true;
pthread_mutex_unlock(&stop_mutex);
However, this is not required as setting an int is atomic (on probably all architectures?). You should use a lock when you are doing not-atomic operations, eg: test and set
take_lock ();
if (a != 1)
a = 1
release_lock ();
You create a new thread each time ThreadTest is called, and never destroy these threads. So eventually you (or the OS) run out of thread handles (a limited resource).
Threads consume resources (memory & processing), and you're creating a thread each time your main loop calls ThreadTest(). And resources are finite, while your loop is not, so this will eventually throw a memory allocation error.
You should get rid of the main loop, and make ThreadTest return the newly created thread (pthread_t). Finally, make main wait for the thread termination using pthread_join.
Your pthreads are zombies and consume system resources. For Linux you can use ulimit -s to check your active upper limits -- but they are not infinite either. Use pthread_join() to let a thread finish and release the resources it consumed.
Do you know that select() is able to read from multiple (device) handles ? You can also define a user defined source to stop select(), or a timeout. With this in mind you are able to start one thread and let it sleeping if nothing occurs. If you intent to stop it, you can send a event (or timeout) to break the select() function call.
An additional design concept you have to consider is message queues to share information between your main application and/or pthread. select() is compatible with this technique so you can use one concept for data sources (devices and message queues).
Here a reference to a good pthread reading and the best pthread book available: Programming with POSIX(R) Threads, ISBN-13:978-0201633924
Looks like you've not called pthread_join() which cleans up state after non-detached threads are finished. I'd speculate that you've hit some per process resource limit here as a result.
As others have noted this is not great design though - why not re-use the thread rather than creating a new one on every loop?

Resources