Why does this test_and_set function lead to deadlock?

Why does this test_and_set function lead to deadlock? - c

I am using two threads to update a global variable. In order to achieve mutual exclusion I am using the test_and_set function. But this code is going to deadlock at some random point during execiution.
Please help.
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
int a = 0;
atomic_int lock = 0;
int test_and_set(int *lock)
{
int l = *lock;
*lock = 1;
return l;
}
void *func(void * param)
{
for(int i = 0; i < 100000; i++)
{
while(test_and_set(&lock));
a++;
lock = 0;
}
}
int main()
{
pthread_t p1, p2;
pthread_create(&p1, NULL, func);
pthread_create(&p2, NULL, func);
pthread_join(p1, NULL);
pthread_join(p2, NULL);
printf("%d, %d\n", a, b);
return 0;
}

Yes, it will deadlock sooner or later.
This is how:
Thread A will "get the lock" and be ready to execute a++ and lock = 0;
Thread B takes over and executes l = *lock;. Since A has the lock l is now 1.
Thread A takes over and executes both a++ and lock = 0;. This sets the lock to zero.
Thread B takes over and executes *lock = 1; and returns l that has the value 1.
Now it's a deadlock. The value of lock is 1. Thread A return 1 from test_and_set and will stay in the while. Thread B will start reading the lock but will always get a 1. There is no thread that will set the lock to zero. Bang...
Likewise you can create a situation where both threads gets "the lock" at the same time.
Long story short: You can't implement a lock like that.
From https://en.wikipedia.org/wiki/Test-and-set:
In computer science, the test-and-set instruction is an instruction used to write 1 (set) to a memory location and return its old value as a single atomic (i.e., non-interruptible) operation.
You don't have "a single atomic operation". Therefore thread A above can execute step 3 between thread B executing step 2 and step 4.
If your processor has a test-and-set instruction, you can try with assembler code.
Note: In the "real" world, task switches do not happen aligned with C statements. A single C statement it typically realized as a number of machine instructions. A task switch can happen at any machine instruction. Still the principle is the same.

I haven't yet figured where your deadlock is occurring but this is not the proper way to implement a lock of data shared by threads. Suppose that lock is 0 and that both threads call test_and_set simultaneously. They will both see that lock is 0 and will both set it to 1. That 0 gets returned and so both threads think that they have the lock.
A mutex (short for "mutual exclusion") accomplishes what you're attempting here by ensuring that the fore-mentioned asynchronous mess can't happen.
When you call pthread_mutex_lock on a mutex, that function will not return until the calling thread has acquired the lock. That means that any other thread calling pthread_mutex_lock will block until the first thread releases the mutex by calling pthread_mutex_unlock.
Here's how you should use the mutex.
int a = 0;
pthread_mutex_t lock;
void *func(void * param)
{
for(int i = 0; i < 100000; i++)
{
pthread_mutex_lock(&lock):
a++;
pthread_mutex_unlock(&lock);
}
}
int main() {
pthread_t p1, p2;
pthread_mutex_init(&lock,NULL);
...
}

Related

pthreads C program hangs at execution

I'm writing a program that performs gaussian elimination given an A and B matrix. I first grab divisor and multipliers, create pthreads which execute in gauss function which perform their operations on a single 'column'. Then I call main which generates new divisor and multipliers and passes back for another round of operations by the same threads. Using condition pthread vars to accomplish this.
The code hangs until I create a breakpoint which it then proceeds and finishes. Not sure what's holding it up. Could use some help.
#include <stdio.h>
#include <pthread.h>
//Need one mutex variable and two condition variables (one c var for
//communicating between threads, and one c var for communicating with main).
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_cond_t condM = PTHREAD_COND_INITIALIZER;
float arr[3][4] = {{2,-3,1, -22},{7,9,-3, 14},{6,7,2,91}};
float mults[3];
float divisor;
int num_items = 3;
void* gauss(void *mine)
{
int thread_count=0;
int x = *((int *)mine);
for(int i=0;i<num_items;i++)
{
/*do something*/
arr[i][x] = arr[i][x] / divisor;
for(int k=0;k<num_items;k++){
if(k!=i)
arr[k][x] -= mults[k] * arr[i][x];
}
/*lock || wait || signal*/
pthread_mutex_lock(&mut);
thread_count++;
if(thread_count < num_items)
pthread_cond_wait(&cond,&mut);
else
{
pthread_cond_signal(&condM);
pthread_cond_wait(&cond,&mut);
thread_count = 0;
}
pthread_mutex_unlock(&mut);
}
return NULL;
}
int main(int argc, const char * argv[]) {
int i, j;
pthread_t threadr[num_items+1]; /*thread id array */
int is[num_items+1];
printf("Test");
// /*input num items*/
// printf("input the number of items ");
// scanf("%d",&num_items);
//
// /*input A array*/
// printf("input A array\n");
// for(i=0;i<num_items;i++)
// for(j=0;j<num_items;j++)
// scanf("%f",&arr[i][j]);
//
// /*input B array*/
// printf("input B array\n");
// for(i=0;i<num_items;i++)
// scanf("%f",&arr[i][num_items]);
/*grab first divisor & multipliers*/
divisor = arr[0][0];
for(i=0;i<num_items;i++)
{
mults[i] = arr[i][0];
}
for(i=0;i<num_items+1;i++)
{
is[i]=i;
if(pthread_create(&threadr[i],NULL,gauss,(void *)&is[i]) != 0)
perror("Pthread_create fails");
}
for(i=1;i<num_items;i++)
{
pthread_mutex_lock(&mut);
pthread_cond_wait(&condM,&mut);
divisor = arr[i][i];
for(j=0;j<num_items;j++)
{
mults[j] = 1;
if(j != i)
mults[j] = arr[j][i];
}
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mut);
}
printf("The X values are:\n");
for(i=0;i<num_items; i++) {
printf("%0.3f \n", arr[i][num_items]);
}
/*wait for all threads*/
for(i=0;i<num_items+1; i++)
if (pthread_join(threadr[i],NULL) != 0)
perror("Pthread_join fails");
return 0;
}

You have a race condition (at least one), and your code does not use its condition variables correctly. You can probably fix the former by fixing the latter. Also, I suspect that you intend gauss()'s local variable thread_count to be shared, but variables with no linkage are not shared.
First, the race condition. Consider the main thread: it starts the three other threads, and then locks the mutex and waits for condition variable condM to be signaled. But suppose the threads all manage to signal condM before the main thread starts waiting? Condition variable operations are immediate -- any signals to condM that occur before main() is waiting on it are lost.
Now let's shift gears to talk about condition variables. As the Linux manual for pthread_cond_wait() puts it:
When using condition variables there is always a Boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.
In other words, condition variables are used to suspend thread operations pending a given condition becoming true. In abstract terms, that condition is always "it's ok for this thread to proceed", but that's realized in context-specific terms. Most importantly, the fact that a thread wakes from its wait never inherently communicates that the condition is true; it merely indicates that the newly-woken thread should check whether the condition is true. Generally, the thread should also check before waiting for the first time, as the condition may already be true.
In pseudocode, that looks like this:
Thread 1:
lock mutex;
loop
if is_ok_to_proceed then exit loop;
wait on condition variable;
end loop
// ... maybe do mutex-protected work ...
unlock mutex
Thread 2:
lock mutex
// ... maybe do mutex-protected work ...
is_ok_to_proceed = true;
signal condition variable;
unlock mutex
Generally speaking, there is also (mutex-protected) code somewhere else that make the CV predicate false, so that sometimes the threads indeed do execute their waits.
Now consider how that applies to the race condition in main(). How does main() know whether to wait on condM()? There needs to be a shared variable somewhere that answers that for it, and its wait must be conditioned on the value of that variable. Any thread that means to allow the main thread to proceed must both set the appropriate value for the variable and signal condM. The main thread itself should set the variable, too, as needed, to indicate that it is not ready, at that time, to proceed.
Of course, your other CV usage suffers from the same kind of problem, too.

Thread concurrency in linux

I am beginner to SO, so please let me know if the question is not clear.
I am using two threads for example A and B. And i have a global variable 'p'.
Thread A is while looping and incrementing the value of 'p'.At the same time B is trying to set the 'p' with some other value(both are two different thread functions).
If I am using mutex for synchronizations, and the thread A get the mutex and incrementation the 'p' in a while loop,but it does not release the mutex.
So my question is that if the thread A doesn’t release the mutex can the thread B access the variable 'p'??
EDIT
The thread B is also protected accses to 'p' using mutex.
If the thread A lock using pthread_mutex_lock(), and doesn’t release it , then what happen if the same thread(A) try to access the lock again(remember the thread A is while looping)
For example
while(1)
{
pthread_mutex_lock(&mutex);
p = 10;
}
Is there any problem with this code if the mutex is never released?

You can still access the variable in thread B as the mutex is a separate object not connected to the variable. If You call mutex lock from thread B before accessing p then the thread B will wait for mutex to be released. In fact the thread A will only execute loop body once as it will wait for the mutex to be released before it can lock it again.
If You don't unlock the mutex then any call to lock the same mutex will wait indefinitely, but the variable will be writable.
In Your example access to variable p is what is called a critical section, or the part of code that is between mutex lock and mutex release.

There is no restriction on mutex, you need to write your program to following the rules of using mutex.
Here is the basic steps to use mutex on shared resource:
Acquire lock first
do job (increase for A, set value for B)
Release lock,
If both A & B follow the rules, then B can't modify it, while A keeps the lock.
Or, if your thread B don't acquire the lock first, it of cause could modify the variable, but that would be a bug for concurrent programming.
And, by the way, you can also use condition together with mutex, so that you can let threads wait & notify each other, instead of looping all the time which is a waste of machine resource.
For your updated question
On linux, in c, there are mainly 3 methods to acquire lock of mutex, what happens when a thread can't get the lock depends on which methods u use.
int pthread_mutex_lock(pthread_mutex_t * mutex );
if it's already locked by another thread, then it block until the lock is unlocked,
int pthread_mutex_trylock(pthread_mutex_t * mutex );
similar to pthread_mutex_lock(), but it won't block, instead return error EBUSY,
int pthread_mutex_timedlock(pthread_mutex_t *restrict mutex, const struct timespec *restrict abs_timeout);
similar to pthread_mutex_lock(), but it will wait for a timeout before return error ETIMEDOUT,

Simple example of statically initialized mutex
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
static int p = 0;
static pthread_mutex_t locker = PTHREAD_MUTEX_INITIALIZER;
static void *
threadFunc(void *arg)
{
int err;
err = pthread_mutex_lock(&locker);
if (err != 0){
perror("pthread_mutex_lock failed");
exit(1);
}
p++;
err = pthread_mutex_unlock(&locker);
if (err != 0){
perror("pthread_mutex_unlock failed");
exit(1);
}
return NULL;
}
int
main(int argc, char *argv[])
{
pthread_t A, B;
pthread_create(&A, NULL, threadFunc, NULL);
pthread_create(&B, NULL, threadFunc, NULL);
pthread_join(A, NULL);
pthread_join(B, NULL);
printf("p = %d\n", p);
return 0;
}
Error checking in main is omitted for brevity but should be used. If you do not release mutex program will never finish, thread B will never get lock.

Synchronizing pthreads using mutex in C

I've got to write a program that counts series of first 10 terms (sorry for my language, this is the first time that I'm talking about math in english) given by formula (x^i)/i!. So, basically it's trivial. BUT, there's some special requirements. Every single term got to be counted by seperated thread, each of them working concurrent. Then all of them got to save results to common variable named result. After that they have to be added by main thread, which will display final result. All of it using pthreads and mutexes.
That's where I have a problem. I was thinking about using table to store results, but I was told by teacher, that it's not correct solution, cause then I don't have to use mutexes. Any ideas what to do and how to synchronize it? I'm completely new to pthread and mutex.
Here's what I got till now. I'm still working on it, so it's not working at the moment, it's just a scheme of a program, where I want to add mutexes. I hope it's not all wrong. ;p
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <pthread.h>
int number = 0;
float result = 0;
pthread_mutex_t term_lock;
pthread_mutex_t main_lock;
int save = 0; //condition variable
int factorial(int x) {
if(x==0 || x==1)
return 1;
return factorial(x-1)*x;
}
void *term(void *value) {
int x = *(int *)value;
float w;
if(save == 0) {
pthread_mutex_lock(&term_lock);
w = pow(x, number)/factorial(number);
result = w;
printf("%d term of series with x: %d is: %f\n", number, x, w);
number++;
save = 1;
pthread_mutex_unlock(&term_lock);
}
return NULL;
}
int main(void) {
int x, i, err = 0;
float final = 0;
pthread_t threads[10];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
printf("Get X: \n");
scanf("%d", &x);
for(i=0; i<10; i++)
{
err = pthread_create(&threads[i], &attr, (void *)term, &x);
if(err) {
printf("Error creating threads.\n");
exit(-1);
}
}
i = 0;
while (number <= 10) {
//printf("While Result: %f, final %f\n", result, final); - shows that it's infinite loop
if(save) {
pthread_mutex_lock(&main_lock);
final = final + result;
save = 0;
pthread_mutex_unlock(&main_lock);
printf("If Result: %f, final %f\n", result, final); //final == last result
}
}
return 0;
}
EDIT: If it's not clear - I need help with solution how to store results of all threads in common variable and synchronizing it.
EDIT2: Possible solution - global variable result shared by all threads. Returned to main thread, it would be added to some local variable, so then I could just overwrite it's value with result from another thread. Of course it will require some synchronization, so another thread won't overwrite it before I add it in main thread. What do you think?
EDIT3: I've updated code with what I have right now. Output is giving me me values of 8-9 terms (printf in term), then program is still working, showing nothing. Commented printf showed me, that while loop is infinite. Also local variable final has just last value of result. What am I doing wrong?

It's rather contrived that the main thread should be the one to add the terms, but the individual threads must all write their results to the same variable. I would ordinarily expect each thread to add its own term to the result (which does require mutex), or possibly to put its result in an array (as you suggested), or to add it to a shared queue (which would require mutex), or even to write it to a pipe. Nevertheless, it can be done your teacher's way.
One of the key problems to solve is that you have to distinctly different operations that you need to synchronize:
The various computational threads' writes to the shared result variable
The main thread's reads of the result variable
You cannot use just a single synchronization construct because you cannot that way distinguish between the computational threads and the main thread. One way to approach this would be to synchronize the computational threads' writes via a mutex, as required, and to synchronize those vs. the main thread's reads via semaphores or condition variables. You could also do it with one or more additional mutexes, but not cleanly.
Additional notes:
the result variable in which your threads deposit their terms must be a global. Threads do not have access to the local variables of the function from which they are launched.
the signature of your term() function is incorrect for a thread start function. The argument must be of type void *.
thread start functions are no different from other functions in that their local variables are accessible only for the duration of the function execution. In particular, returning a pointer to a local variable cannot do anything useful, as any attempt to later dereference such a pointer produces undefined behavior.
I'm not going to write your homework for you, but here's an approach that can work:
The main thread initializes a mutex and two semaphores, the latter with initial values zero.
The main thread launches all the computational threads. Although it's ugly, you can feed them their numeric arguments by casting those to void *, and then casting them back in the term() function (since its argument should be a void *).
The main thread then loops. At each iteration, it
waits for semaphore 1 (sem_wait())
adds the value of the global result variable to a running total
posts to semaphore 2 (sem_post())
if as many iterations have been performed as there are threads, breaks from the loop
Meanwhile, each computational thread does this:
Computes the value of the appropriate term
locks the mutex
stores the term value in the global result variable
posts to semaphore 1
waits for semaphore 2
unlocks the mutex
Update:
To use condition variables for this job, it is essential to identify which shared state is being protected by those condition variables, as one must always protect against waking spurriously from a wait on a condition variable.
In this case, it seems natural that the shared state in question would involve the global result variable in which the computational threads return their results. There are really two general, mutually exclusive states of that variable:
Ready to receive a value from a computational thread, and
Ready for the main thread to read.
The computational threads need to wait for the first state, and the main thread needs to wait (repeatedly) for the second. Since there are two different conditions that threads will need to wait on, you need two condition variables. Here's an alternative approach using these ideas:
The main thread initializes a mutex and two condition variables, and sets result to -1.
The main thread launches all the computational threads. Although it's ugly, you can feed them their numeric arguments by casting those to void *, and then casting them back in the term() function (since its argument should be a void *).
The main thread locks the mutex
The main thread then loops. At each iteration, it
tests whether result is non-negative. If so, it
adds the value of result variable to a running total
if as many terms have been added as there are threads, breaks from the loop
sets result to -1.
signals condition variable 1
waits on condition variable 2
Having broken from the loop, the main thread unlocks the mutex
Meanwhile, each computational thread does this:
Computes its term
Locks the mutex
Loops:
Checks the value of result. If it is less than zero then breaks from the loop
waits on condition variable 1
Having broken from the loop, sets result to the computed term
signals condition variable 2
unlocks the mutex

The number is shared between all the threads, so you will need to protect that with a mutex (which is probably what your teacher is wanting to see)
pthread_mutex_t number_mutex;
pthread_mutex_t result_mutex;
int number = 0;
int result = 0;
void *term(int x) {
float w;
// Critical zone, make sure only one thread updates `number`
pthread_mutex_lock(&number_mutex);
int mynumber = number++;
pthread_mutex_unlock(&number_mutex);
// end of critical zone
w = pow(x, mynumber)/factorial(mynumber);
printf("%d term of series with x: %d is: %f\n", mynumber, x, w);
// Critical zone, make sure only one thread updates `result`
pthread_mutex_lock(&result_mutex);
result += w;
pthread_mutex_unlock(&result_mutex);
// end of critical zone
return (void *)0;
}
You should also remove the DETACHED state and do a thread-join at the end of your main program before printing out the result

Here is my solution to your problem:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <pthread.h>
int number=0;
float result[10];
pthread_mutex_t lock;
int factorial(int x) {
if(x==0 || x==1)
return 1;
return factorial(x-1)*x;
}
void *term(void *value) {
int x = *(int *)value;
float w;
pthread_mutex_lock(&lock);
w = pow(x, number)/factorial(number);
printf("%d term of series with x: %d is: %f\n", number, x, w);
result[number] = w;
number++;
pthread_mutex_unlock(&lock);
return NULL;
}
int main(void) {
int x, i, err;
pthread_t threads[10];
printf("Get X: \n");
scanf("%d", &x);
for(i=0; i<=9; i++)
{
err = pthread_create(&threads[i], NULL, term, &x);
if(err) {
printf("Error creating threads.\n");
exit(-1);
}
}
for(i = 0; i < 10; i++)
{
pthread_join(threads[i], NULL);
}
i = 0;
for(i=0; i<=9; i++)
{
printf("%f\n", result[i]);
}
return 0;
}
This code creates a global mutex pthread_mutex_t lock that (in this case) makes sure that same code is not executed by anyone at the same time: basically when one thread executes pthread_mutex_lock(&lock), it forbids any other thread from executing that part of the code until the "original" thread executes pthread_mutex_unlock(&lock).
The other important part is pthread_join: what this does is force the main thread to wait for the execution of every other thread created; this way, float result[10] is written before actually being worked on in the main thread (in this case, the last print instruction).
Other than that, I fixed a couple of bugs in your code that other users pointed out.

If result is to be a single variable, then one solution is to use an array of 20 mutexes: aMutex[20];. Main locks all 20 mutexes then starts the pthreads. Each pthread[i] computes a local term, waits for aMutex[i], stores it's value into result, then unlocks aMutex[10+i]. In main() for(i = 0; i < 20; i++){ unlock aMutex[i] to allow pthread[i] to store its value into result, then wait for aMutex[10+i] to know that result is updated, then add result to a sum. }

The version of pthread_join() that does not block main(): POSIX

I am trying to write a code that does not block main() when pthread_join() is called:
i.e. basically trying to implement my previous question mentioned below:
https://stackoverflow.com/questions/24509500/pthread-join-and-main-blocking-multithreading
And the corresponding explanation at:
pthreads - Join on group of threads, wait for one to exit
As per suggested answer:
You'd need to create your own version of it - e.g. an array of flags (one flag per thread) protected by a mutex and a condition variable; where just before "pthread_exit()" each thread acquires the mutex, sets its flag, then does "pthread_cond_signal()". The main thread waits for the signal, then checks the array of flags to determine which thread/s to join (there may be more than one thread to join by then).
I have tried as below:
My status array which keeps a track of which threads have finished:
typedef struct {
int Finish_Status[THREAD_NUM];
int signalled;
pthread_mutex_t mutex;
pthread_cond_t FINISHED;
}THREAD_FINISH_STATE;
The thread routine, it sets the corresponding array element when the thread finishes and also signals the condition variable:
void* THREAD_ROUTINE(void* arg)
{
THREAD_ARGUMENT* temp=(THREAD_ARGUMENT*) arg;
printf("Thread created with id %d\n",temp->id);
waitFor(5);
pthread_mutex_lock(&(ThreadFinishStatus.mutex));
ThreadFinishStatus.Finish_Status[temp->id]=TRUE;
ThreadFinishStatus.signalled=TRUE;
if(ThreadFinishStatus.signalled==TRUE)
{
pthread_cond_signal(&(ThreadFinishStatus.FINISHED));
printf("Signal that thread %d finished\n",temp->id);
}
pthread_mutex_unlock(&(ThreadFinishStatus.mutex));
pthread_exit((void*)(temp->id));
}
I am not able to write the corresponding parts pthread_join() and pthread_cond_wait() functions. There are a few things which I am not able to implement.
1) How to write corresponding part pthread_cond_wait() in my main()?
2) I am trying to write it as:
pthread_mutex_lock(&(ThreadFinishStatus.mutex));
while((ThreadFinishStatus.signalled != TRUE){
pthread_cond_wait(&(ThreadFinishStatus.FINISHED), &(ThreadFinishStatus.mutex));
printf("Main Thread signalled\n");
ThreadFinishStatus.signalled==FALSE; //Reset signalled
//check which thread to join
}
pthread_mutex_unlock(&(ThreadFinishStatus.mutex));
But it does not enter the while loop.
3) How to use pthread_join() so that I can get the return value stored in my arg[i].returnStatus
i.e. where to put below statement in my main:
`pthread_join(T[i],&(arg[i].returnStatus));`
COMPLETE CODE
#include <stdio.h>
#include <pthread.h>
#include <time.h>
#define THREAD_NUM 5
#define FALSE 0
#define TRUE 1
void waitFor (unsigned int secs) {
time_t retTime;
retTime = time(0) + secs; // Get finishing time.
while (time(0) < retTime); // Loop until it arrives.
}
typedef struct {
int Finish_Status[THREAD_NUM];
int signalled;
pthread_mutex_t mutex;
pthread_cond_t FINISHED;
}THREAD_FINISH_STATE;
typedef struct {
int id;
void* returnStatus;
}THREAD_ARGUMENT;
THREAD_FINISH_STATE ThreadFinishStatus;
void initializeState(THREAD_FINISH_STATE* state)
{
int i=0;
state->signalled=FALSE;
for(i=0;i<THREAD_NUM;i++)
{
state->Finish_Status[i]=FALSE;
}
pthread_mutex_init(&(state->mutex),NULL);
pthread_cond_init(&(state->FINISHED),NULL);
}
void destroyState(THREAD_FINISH_STATE* state)
{
int i=0;
for(i=0;i<THREAD_NUM;i++)
{
state->Finish_Status[i]=FALSE;
}
pthread_mutex_destroy(&(state->mutex));
pthread_cond_destroy(&(state->FINISHED));
}
void* THREAD_ROUTINE(void* arg)
{
THREAD_ARGUMENT* temp=(THREAD_ARGUMENT*) arg;
printf("Thread created with id %d\n",temp->id);
waitFor(5);
pthread_mutex_lock(&(ThreadFinishStatus.mutex));
ThreadFinishStatus.Finish_Status[temp->id]=TRUE;
ThreadFinishStatus.signalled=TRUE;
if(ThreadFinishStatus.signalled==TRUE)
{
pthread_cond_signal(&(ThreadFinishStatus.FINISHED));
printf("Signal that thread %d finished\n",temp->id);
}
pthread_mutex_unlock(&(ThreadFinishStatus.mutex));
pthread_exit((void*)(temp->id));
}
int main()
{
THREAD_ARGUMENT arg[THREAD_NUM];
pthread_t T[THREAD_NUM];
int i=0;
initializeState(&ThreadFinishStatus);
for(i=0;i<THREAD_NUM;i++)
{
arg[i].id=i;
}
for(i=0;i<THREAD_NUM;i++)
{
pthread_create(&T[i],NULL,THREAD_ROUTINE,(void*)&arg[i]);
}
/*
Join only if signal received
*/
pthread_mutex_lock(&(ThreadFinishStatus.mutex));
//Wait
while((ThreadFinishStatus.signalled != TRUE){
pthread_cond_wait(&(ThreadFinishStatus.FINISHED), &(ThreadFinishStatus.mutex));
printf("Main Thread signalled\n");
ThreadFinishStatus.signalled==FALSE; //Reset signalled
//check which thread to join
}
pthread_mutex_unlock(&(ThreadFinishStatus.mutex));
destroyState(&ThreadFinishStatus);
return 0;
}

Here is an example of a program that uses a counting semaphore to watch as threads finish, find out which thread it was, and review some result data from that thread. This program is efficient with locks - waiters are not spuriously woken up (notice how the threads only post to the semaphore after they've released the mutex protecting shared state).
This design allows the main program to process the result from some thread's computation immediately after the thread completes, and does not require the main wait for all threads to complete. This would be especially helpful if the running time of each thread varied by a significant amount.
Most importantly, this program does not deadlock nor race.
#include <pthread.h>
#include <semaphore.h>
#include <stdlib.h>
#include <stdio.h>
#include <queue>
void* ThreadEntry(void* args );
typedef struct {
int threadId;
pthread_t thread;
int threadResult;
} ThreadState;
sem_t completionSema;
pthread_mutex_t resultMutex;
std::queue<int> threadCompletions;
ThreadState* threadInfos;
int main() {
int numThreads = 10;
int* threadResults;
void* threadResult;
int doneThreadId;
sem_init( &completionSema, 0, 0 );
pthread_mutex_init( &resultMutex, 0 );
threadInfos = new ThreadState[numThreads];
for ( int i = 0; i < numThreads; i++ ) {
threadInfos[i].threadId = i;
pthread_create( &threadInfos[i].thread, NULL, &ThreadEntry, &threadInfos[i].threadId );
}
for ( int i = 0; i < numThreads; i++ ) {
// Wait for any one thread to complete; ie, wait for someone
// to queue to the threadCompletions queue.
sem_wait( &completionSema );
// Find out what was queued; queue is accessed from multiple threads,
// so protect with a vanilla mutex.
pthread_mutex_lock(&resultMutex);
doneThreadId = threadCompletions.front();
threadCompletions.pop();
pthread_mutex_unlock(&resultMutex);
// Announce which thread ID we saw finish
printf(
"Main saw TID %d finish\n\tThe thread's result was %d\n",
doneThreadId,
threadInfos[doneThreadId].threadResult
);
// pthread_join to clean up the thread.
pthread_join( threadInfos[doneThreadId].thread, &threadResult );
}
delete threadInfos;
pthread_mutex_destroy( &resultMutex );
sem_destroy( &completionSema );
}
void* ThreadEntry(void* args ) {
int threadId = *((int*)args);
printf("hello from thread %d\n", threadId );
// This can safely be accessed since each thread has its own space
// and array derefs are thread safe.
threadInfos[threadId].threadResult = rand() % 1000;
pthread_mutex_lock( &resultMutex );
threadCompletions.push( threadId );
pthread_mutex_unlock( &resultMutex );
sem_post( &completionSema );
return 0;
}

Pthread conditions don't have "memory"; pthread_cond_wait doesn't return if pthread_cond_signal is called before pthread_cond_wait, which is why it's important to check the predicate before calling pthread_cond_wait, and not call it if it's true. But that means the action, in this case "check which thread to join" should only depend on the predicate, not on whether pthread_cond_wait is called.
Also, you might want to make the while loop actually wait for all the threads to terminate, which you aren't doing now.
(Also, I think the other answer about "signalled==FALSE" being harmless is wrong, it's not harmless, because there's a pthread_cond_wait, and when that returns, signalled would have changed to true.)
So if I wanted to write a program that waited for all threads to terminate this way, it would look more like
pthread_mutex_lock(&(ThreadFinishStatus.mutex));
// AllThreadsFinished would check that all of Finish_Status[] is true
// or something, or simpler, count the number of joins completed
while (!AllThreadsFinished()) {
// Wait, keeping in mind that the condition might already have been
// signalled, in which case it's too late to call pthread_cond_wait,
// but also keeping in mind that pthread_cond_wait can return spuriously,
// thus using a while loop
while (!ThreadFinishStatus.signalled) {
pthread_cond_wait(&(ThreadFinishStatus.FINISHED), &(ThreadFinishStatus.mutex));
}
printf("Main Thread signalled\n");
ThreadFinishStatus.signalled=FALSE; //Reset signalled
//check which thread to join
}
pthread_mutex_unlock(&(ThreadFinishStatus.mutex));

Your code is racy.
Suppose you start a thread and it finishes before you grab the mutex in main(). Your while loop will never run because signalled was already set to TRUE by the exiting thread.
I will echo #antiduh's suggestion to use a semaphore that counts the number of dead-but-not-joined threads. You then loop up to the number of threads spawned waiting on the semaphore. I'd point out that the POSIX sem_t is not like a pthread_mutex in that sem_wait can return EINTR.

Your code appears fine. You have one minor buglet:
ThreadFinishStatus.signalled==FALSE; //Reset signalled
This does nothing. It tests whether signalled is FALSE and throws away the result. That's harmless though since there's nothing you need to do. (You never want to set signalled to FALSE because that loses the fact that it was signalled. There is never any reason to unsignal it -- if a thread finished, then it's finished forever.)
Not entering the while loop means signalled is TRUE. That means the thread already set it, in which case there is no need to enter the loop because there's nothing to wait for. So that's fine.
Also:
ThreadFinishStatus.signalled=TRUE;
if(ThreadFinishStatus.signalled==TRUE)
There's no need to test the thing you just set. It's not like the set can fail.
FWIW, I would suggest re-architecting. If the existing functions like pthread_join don't do exactly what you want, just don't use them. If you're going to have structures that track what work is done, then totally separate that from thread termination. Since you will already know what work is done, what different does it make when and how threads terminate? Don't think of this as "I need a special way to know when a thread terminates" and instead think of this "I need to know what work is done so I can do other things".

Output in multi threading program

Writing my basic programs on multi threading and I m coming across several difficulties.
In the program below if I give sleep at position 1 then value of shared data being printed is always 10 while keeping sleep at position 2 the value of shared data is always 0.
Why this kind of output is coming ?
How to decide at which place we should give sleep.
Does this mean that if we are placing a sleep inside the mutex then the other thread is not being executed at all thus the shared data being 0.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include<unistd.h>
pthread_mutex_t lock;
int shared_data = 0;
void * function(void *arg)
{
int i ;
for(i =0; i < 10; i++)
{
pthread_mutex_lock(&lock);
shared_data++;
pthread_mutex_unlock(&lock);
}
pthread_exit(NULL);
}
int main()
{
pthread_t thread;
void * exit_status;
int i;
pthread_mutex_init(&lock, NULL);
i = pthread_create(&thread, NULL, function, NULL);
for(i =0; i < 10; i++)
{
sleep(1); //POSITION 1
pthread_mutex_lock(&lock);
//sleep(1); //POSITION 2
printf("Shared data value is %d\n", shared_data);
pthread_mutex_unlock(&lock);
}
pthread_join(thread, &exit_status);
pthread_mutex_destroy(&lock);
}

When you sleep before you lock the mutex, then you're giving the other thread plenty of time to change the value of the shared variable. That's why you're seeing a value of "10" with the 'sleep' in position #1.
When you grab the mutex first, you're able to lock it fast enough that you can print out the value before the other thread has a chance to modify it. The other thread sits and blocks on the pthread_mutex_lock() call until your main thread has finished sleeping and unlocked it. At that point, the second thread finally gets to run and alter the value. That's why you're seeing a value of "0" with the 'sleep' at position #2.
This is a classic case of a race condition. On a different machine, the same code might not display "0" with the sleep call at position #2. It's entirely possible that the second thread has the opportunity to alter the value of the variable once or twice before your main thread locks the mutex. A mutex can ensure that two threads don't access the same variable at the same time, but it doesn't have any control over the order in which the two threads access it.

I had a full explanation here but ended up deleting it. This is a basic synchronization problem and you should be able to trace and identify it before tackling anything more complicated.
But I'll give you a hint: It's only the sleep() in position 1 that matters; the other one inside the lock is irrelevant as long as it doesn't change the code outside the lock.