(Operating System) How can I use __asm mfence in c

(Operating System) How can I use __asm mfence in c - c

I'm taking an operating system class and my professor gave us this homework.
"Place __asm mfence in a proper position."
This problem is about using multiple threads and its side-effect.
Main thread is increasing shared_var but thread_1 is doing it in the same time.
Thus, shared_var becomes 199048359.000 when the code is increasing number 2000000 times.
The professor said __asm mfence will solve this issue. But, I do not know where to place it.
I'm trying to search the problem on google, github and here but I cannot find a source.
I do not know this is a stupid question because I'm not majoring in computer science.
Also, I would like to know why this code shows 199948358.0000 not 2000000.00
Any help would be greatly appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <conio.h>
int turn;
int interested[2];
void EnterRegion(int process);
void LeaveRegion(int process);
DWORD WINAPI thread_func_1(LPVOID lpParam);
volatile double shared_var = 0.0;
volatile int job_complete[2] = {0, 0};
int main(void)
{
DWORD dwThreadId_1, dwThrdParam_1 = 1;
HANDLE hThread_1;
int i, j;
// Create Thread 1
hThread_1 = CreateThread(
NULL, // default security attributes
0, // use default stack size
thread_func_1, // thread function
&dwThrdParam_1, // argument to thread function
0, // use default creation flags
&dwThreadId_1
); // returns the thread identifier
// Check the return value for success.
if (hThread_1 == NULL)
{
printf("Thread 1 creation error\n");
exit(0);
}
else
{
CloseHandle( hThread_1 );
}
/* I am main thread */
/* Now Main Thread and Thread 1 runs concurrently */
for (i = 0; i < 10000; i++)
{
for (j = 0; j < 10000; j++)
{
EnterRegion(0);
shared_var++;
LeaveRegion(0);
}
}
printf("Main Thread completed\n");
job_complete[0] = 1;
while (job_complete[1] == 0) ;
printf("%f\n", shared_var);
_getch();
ExitProcess(0);
}
DWORD WINAPI thread_func_1(LPVOID lpParam)
{
int i, j;
for (i = 0; i < 10000; i++) {
for (j = 0; j < 10000; j++)
{
EnterRegion(1);
shared_var++;
LeaveRegion(1);
}
}
printf("Thread_1 completed\n");
job_complete[1] = 1;
ExitThread(0);
}
void EnterRegion(int process)
{
_asm mfence;
int other;
other = 1 - process;
interested[process] = TRUE;
turn = process;
while (turn == process && interested[other] == TRUE) {}
_asm mfence;
}
void LeaveRegion(int process)
{
_asm mfence;
interested[process] = FALSE;
_asm mfence;
}

The EnterRegion() and LeaveRegion() functions are implementing a critical region using a thing called "Peterson's algorithm".
Now, the key to Peterson's algorithm is that when a thread reads turn it must get the latest (most recent) value written by any thread. That is, operations on turn must be Sequentially Consistent. Also, the write to interested[] in EnterRegion() must become visible to all threads before (or at the same time) as the write to turn.
So the place to put the mfence is after the turn = process ; -- so that the thread does not proceed until its write to turn is visible to all other threads.
It is also important to persuade the compiler to read from memory every time it reads turn and interested[], so you should set them volatile.
If you are writing this for x86 or x86_64, that is sufficient -- because they are generally "well behaved", so that:
all the writes to turn and interested[process] will occur in program order
all the reads of turn and interested[other] will also occur in program order
and setting those volatile ensures that the compiler doesn't fiddle with the order, either.
The reason for using the mfence on the x86 and x86_64 in this case is to flush the write queue to memory before proceeding to read the turn value. So, all memory writes go into a queue, and at some time in the future each write will reach actual memory, and the effect of the write will become visible to other threads -- the write has "completed". Writes "complete" in the same order the program did them, but delayed. If the thread reads something it has written recently, the processor will pick the (most recent) value out of the write queue. This means that the thread does not need to wait until the write "completes", which is generally a Good Thing. However, it does mean that the thread is not reading the same value that any other thread will read, at least until the write does "complete". What the mfence does is to stall the processor until all outstanding writes have "completed" -- so any following reads will read the same thing any other thread would read.
The write to interested[] in LeaveRegion() does not (on x86/x86_64) require an mfence, which is good because mfence is a costly operation. Each thread only ever writes to its own interested[] flag and only ever reads the other's. The only constraint on this write is that it must not "complete" after the write in EnterRegion() (!). Happily the x86/x86_64 does all writes in order. [Though, of course, after the write in LeaveRegion() the write in EnterRegion() may "complete" before the other thread reads the flag.]
For other devices, you might want other fences to enforce the ordering of reads/writes of turn and interested[]. But I don't pretend to know enough to advise on ARM or POWERPC or anything else.

Related

Use while loop to make a thread wait till the lock variable is set to avoid race condition in C prgramming

#include <stdio.h>
#include <pthread.h>
long mails = 0;
int lock = 0;
void *routine()
{
printf("Thread Start\n");
for (long i = 0; i < 100000; i++)
{
while (lock)
{
}
lock = 1;
mails++;
lock = 0;
}
printf("Thread End\n");
}
int main(int argc, int *argv[])
{
pthread_t p1, p2;
if (pthread_create(&p1, NULL, &routine, NULL) != 0)
{
return 1;
}
if (pthread_create(&p2, NULL, &routine, NULL) != 0)
{
return 2;
}
if (pthread_join(p1, NULL) != 0)
{
return 3;
}
if (pthread_join(p2, NULL) != 0)
{
return 4;
}
printf("Number of mails: %ld \n", mails);
return 0;
}
In the above code each thread runs a for loop to increase the value
of mails by 100000.
To avoid race condition is used lock variable
along with while loop.
Using while loop in routine function does not
help to avoid race condition and give correct output for mails
variable.

In C, the compiler can safely assume a (global) variable is not modified by other threads unless in few cases (eg. volatile variable, atomic accesses). This means the compiler can assume lock is not modified and while (lock) {} can be replaced with an infinite loop. In fact, this kind of loop cause an undefined behaviour since it does not have any visible effect. This means the compiler can remove it (or generate a wrong code). The compiler can also remove the lock = 1 statement since it is followed by lock = 0. The resulting code is bogus. Note that even if the compiler would generate a correct code, some processor (eg. AFAIK ARM and PowerPC) can reorder instructions resulting in a bogus behaviour.
To make sure accesses between multiple threads are correct, you need at least atomic accesses on lock. The atomic access should be combined with proper memory barriers for relaxed atomic accesses. The thing is while (lock) {} will result in a spin lock. Spin locks are known to be a pretty bad solution in many cases unless you really know what you are doing and all the consequence (in doubt, don't use them).
Generally, it is better to uses mutexes, semaphores and wait conditions in this case. Mutexes are generally implemented using an atomic boolean flag internally (with right memory barriers so you do not need to care about that). When the flag is mark as locked, an OS sleeping function is called. The sleeping function wake up when the lock has been released by another thread. This is possible since the thread releasing a lock can send a wake up signal. For more information about this, please read this. In old C, you can use pthread for that. Since C11, you can do that directly using this standard API. For pthread, it is here (do not forget the initialization).

If you really want a spinlock, you need something like:
#include <stdatomic.h>
atomic_flag lock = ATOMIC_FLAG_INIT;
void *routine()
{
printf("Thread Start\n");
for (long i = 0; i < 100000; i++)
{
while (atomic_flag_test_and_set(&lock)) {}
mails++;
atomic_flag_clear(&lock);
}
printf("Thread End\n");
}
However, since you are already using pthreads, you're better off using a pthread_mutex

Jérôme Richard told you about ways in which the compiler could optimize the sense out of your code, but even if you turned all the optimizations off, you still would be left with a race condition. You wrote
while (lock) { }
lock=1;
...critical section...
lock=0;
The problem with that is, suppose lock==0. Two threads racing toward that critical section at the same time could both test lock, and they could both find that lock==0. Then they both would set lock=1, and they both would enter the critical section...
...at the same time.
In order to implement a spin lock,* you need some way for one thread to prevent other threads from accessing the lock variable in between when the first thread tests it, and when the first thread sets it. You need an atomic (i.e., indivisible) "test and set" operation.
Most computer architectures have some kind of specialized op-code that does what you want. It has names like "test and set," "compare and exchange," "load-linked and store-conditional," etc. Chris Dodd's answer shows you how to use a standard C library function that does the right thing on whatever CPU you happen to be using...
...But don't forget what Jérôme said.*
* Jérôme told you that spin locks are a bad idea.

Reader Writer Problem With Writer Priority Problem

I came across this problem as I am learning more about operating systems. In my code, I've tried making the reader having priority and it worked, so next I modified it a bit to make the writer have the priority. When I ran the code, the output was exactly the same and it seemed like the writer did not have the priority. Here is the code with comments. I am not sure what I've done wrong, since I modified a lot of the code but the output remains the same if I did not change it at all.
#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>
/*
This program provides a possible solution for first readers writers problem using mutex and semaphore.
I have used 10 readers and 5 producers to demonstrate the solution. You can always play with these values.
*/
// Semaphore initialization for writer and reader
sem_t wrt;
sem_t rd;
// Mutex 1 blocks other readers, mutex 2 blocks other writers
pthread_mutex_t mutex1;
pthread_mutex_t mutex2;
// Value the writer is changing, we are simply multiplying this value by 2
int cnt = 2;
int numreader = 0;
int numwriter = 0;
void *writer(void *wno)
{
pthread_mutex_lock(&mutex2);
numwriter++;
if(numwriter == 1){
sem_wait(&rd);
}
pthread_mutex_unlock(&mutex2);
sem_wait(&wrt);
// Writing Section
cnt = cnt*2;
printf("Writer %d modified cnt to %d\n",(*((int *)wno)),cnt);
sem_post(&wrt);
pthread_mutex_lock(&mutex2);
numwriter--;
if(numwriter == 0){
sem_post(&rd);
}
pthread_mutex_unlock(&mutex2);
}
void *reader(void *rno)
{
sem_wait(&rd);
pthread_mutex_lock(&mutex1);
numreader++;
if(numreader == 1){
sem_wait(&wrt);
}
pthread_mutex_unlock(&mutex1);
sem_post(&rd);
// Reading Section
printf("Reader %d: read cnt as %d\n",*((int *)rno),cnt);
pthread_mutex_lock(&mutex1);
numreader--;
if(numreader == 0){
sem_post(&wrt);
}
pthread_mutex_unlock(&mutex1);
}
int main()
{
pthread_t read[10],write[5];
pthread_mutex_init(&mutex1, NULL);
pthread_mutex_init(&mutex2, NULL);
sem_init(&wrt,0,1);
sem_init(&rd,0,1);
int a[10] = {1,2,3,4,5,6,7,8,9,10}; //Just used for numbering the writer and reader
for(int i = 0; i < 5; i++) {
pthread_create(&write[i], NULL, (void *)writer, (void *)&a[i]);
}
for(int i = 0; i < 10; i++) {
pthread_create(&read[i], NULL, (void *)reader, (void *)&a[i]);
}
for(int i = 0; i < 5; i++) {
pthread_join(write[i], NULL);
}
for(int i = 0; i < 10; i++) {
pthread_join(read[i], NULL);
}
pthread_mutex_destroy(&mutex1);
pthread_mutex_destroy(&mutex2);
sem_destroy(&wrt);
sem_destroy(&rd);
return 0;
}
Output (for both is the same. I think if writer had priority it will change first, then will be read):

Alternative Semantics
Much of what you want to do can probably be accomplished with less overhead. For example, in the classic reader-writer problem, readers shouldn’t need to block other readers.
You might be able to replace the reader-writer pattern with a publisher-consumer pattern that manages pointers to blocks of data with acquire-consume memory ordering. You only need locking at all if one thread needs to update the same block of memory after it was originally written.
POSIX and Linux have an implementation of reader-writer locks in the system library, which were designed to avoid starvation. This is most likely the high-level construct you want.
If you still want to implement your own, one implementation would use a count of current readers, a count of pending writers and a flag that indicates whether a write is in progress. It packs all these values into an atomic bitfield that it updates with a compare-and-swap.
Reader threads would retrieve the value, check whether there are any starving writers waiting, and if not, increment the count of readers. If there are writers, it backs off (perhaps spinning and yielding the CPU, perhaps sleeping on a condition variable). If there is a write in progress, it waits for that to complete. If it sees only other reads in progress, it goes ahead.
Writer threads would check if there are any reads or writes in progress. If so, they increment the count of waiting writers, and wait. If not, they set the write-in-progress bit and proceed.
Packing all these fields into the same atomic bitfield guarantees that no thread will think it’s safe to use the buffer while another thread thinks it’s safe to write: if two threads try to update the state at the same time, one will always fail.
If You Stick With Semaphores
You can still have reader threads check sem_getvalue() on the writer semaphore, and back off if they see any starved writers are waiting. One method would be to wait on a condition variable that threads signal when they are done with the buffer. A reader that sees that it holds the mutex while writers are waiting can try to wake up one writer thread and go back to sleep, and a reader that sees only other readers are waiting can wake up a reader, which will wake up the next reader, and so on.

Readers-Writers problem writers-preference (readers may starve)

I have problem with readers-writers problem. I want to write writers favor solution using mutex. So far i have written this
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <memory.h>
#include <stdbool.h>
#include <stdint.h>
#include<unistd.h>
int NO_READERS;
int NO_WRITERS;
int NO_READERS_READING = 0; // How many readers need shared resources
int NO_WRITERS_WRITING = 0; // How many writers need shared resources
pthread_mutex_t resourceMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t tryResourceMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t readerMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t writerMutex = PTHREAD_MUTEX_INITIALIZER;
void *readerJob(void *arg) {
int *id = (int*)arg;
while (1) {
pthread_mutex_lock(&tryResourceMutex); // Indicate reader is trying to enter
pthread_mutex_lock(&readerMutex);
NO_READERS_READING++; // Indicate that you are needing the shared resource (one more reader)
if (NO_READERS_READING == 1) {
pthread_mutex_lock(&resourceMutex);
}
pthread_mutex_unlock(&readerMutex);
pthread_mutex_unlock(&tryResourceMutex);
printf("READER ID %d WALKED IN \n",*id);
printf("ReaderQ: %d , WriterQ: %d [in: R:%d W:%d]\n",
NO_READERS - NO_READERS_READING,
NO_WRITERS - NO_WRITERS_WRITING,
NO_READERS_READING,
NO_WRITERS_WRITING);
sleep(1);
pthread_mutex_lock(&readerMutex);
NO_READERS_READING--;
if (NO_READERS_READING == 0) { // Check if you are the last reader
pthread_mutex_unlock(&resourceMutex);
}
pthread_mutex_unlock(&readerMutex);
}
return 0;
}
void *writerJob(void *arg) {
int *id = (int*)arg;
while (1) {
pthread_mutex_lock(&writerMutex);
NO_WRITERS_WRITING++;
if (NO_WRITERS_WRITING == 1) {
pthread_mutex_lock(&tryResourceMutex); // If there are no other writers lock the readers out
}
pthread_mutex_unlock(&writerMutex);
pthread_mutex_lock(&resourceMutex);
printf("WRITER ID %d WALKED IN \n",*id);
printf("ReaderQ: %d , WriterQ: %d [in: R:%d W:%d]\n",
NO_READERS - NO_READERS_READING,
NO_WRITERS - NO_WRITERS_WRITING,
NO_READERS_READING,
NO_WRITERS_WRITING);
sleep(1);
pthread_mutex_unlock(&resourceMutex);
pthread_mutex_lock(&writerMutex);
NO_WRITERS_WRITING--;
if (NO_WRITERS_WRITING == 0) {
pthread_mutex_unlock(&tryResourceMutex); // If there are no writers left unlock the readers
}
pthread_mutex_unlock(&writerMutex);
}
return 0;
}
int main(int argc, char *argv[]) {
NO_READERS = atoi(argv[1]);
NO_WRITERS = atoi(argv[2]);
// Initialize arrays of threads IDs
pthread_t *readersThreadsIds = malloc(NO_READERS * sizeof(pthread_t));
pthread_t *writersThreadsIds = malloc(NO_READERS * sizeof(pthread_t));
// Initialize shared memory (array) with random numbers
// Create readers threads
for (int i = 0; i < NO_READERS; ++i) {
int* id = (int*)(malloc(sizeof(int)));
*id = i;
pthread_create(&readersThreadsIds[i], NULL, readerJob,(void*)id);
}
// Create writers threads
for (int i = 0; i < NO_WRITERS; ++i) {
int* id = (int*)(malloc(sizeof(int)));
*id = i;
pthread_create(&writersThreadsIds[i], NULL, writerJob, (void*)id);
}
// Wait for readers to finish
for (int i = 0; i < NO_READERS; ++i) {
pthread_join(readersThreadsIds[i], NULL);
}
// Wait for writers to finish
for (int i = 0; i < NO_WRITERS; ++i) {
pthread_join(writersThreadsIds[i], NULL);
}
free(readersThreadsIds);
free(writersThreadsIds);
pthread_mutex_destroy(&resourceMutex);
pthread_mutex_destroy(&tryResourceMutex);
pthread_mutex_destroy(&readerMutex);
pthread_mutex_destroy(&writerMutex);
return 0;
}
And I'm not sure if this should be working like this. Can anyone check this for me? I want to have information about which reader or writer is going in or out. It seems like it stuck in some point but I don't know why.

It seems to do what you want, that is give preference to the writers. Because your threads loop acquiring and releasing the lock; if you have more than one writer, the writers will take turns passing it between themselves and starve the readers. That is, every time one releases the resourceMutex, there is another writer waiting on it, so NO_WRITERS_WRITING will never hit zero.
To see it operating as intended, add a delay at the top of the while loop of each thread:
usleep((rand() % 10000) * 10000);
That will permit the readers to periodically get access, whenever all the writers are in the usleep().

At the begining all readers are coming in,
By "coming in", I take you to mean executing the printf() calls in the readerJob loop. It's not surprising that the readers all come in first, because you start them first, and in the likely event that the first reader thread to attempt to lock tryResourceMutex does so before any writer thread does, it will then lock resourceMutex(), too, preventing any writer from "coming in". But that does not prevent writers from incrementing NO_WRITERS_WRITING. And it also does not preventing one of them from locking tryResourceMutex and holding it locked.
The sleep() call in the reader will then (probably) cause resourceMutex to be held continuously long enough that all the readers come in before any of the writers do, since each writer needs to acquire resourceMutex to come in.
then also writers which shouldn't be possible at the same time.
I don't see that in my tests. But I do see what I already described: the writer count increases from zero, even though they are prevented from coming in while any readers are inside. In effect, the name of your variable NO_WRITERS_WRITING is inconsistent with your actual usage -- indicates how many writers are writing or waiting to write.
When the readers leave they are blocked from reentering right away because one of the writers holds tryResourceMutex. Eventually, then, the last reader will exit and release the resourceMutex. This will allow the writers to proceed, one at a time, but with the sleep() call positioned where it is in the writer loop, it is extremely unlikely that the number of writers will ever fall to zero to allow any of the readers to re-enter. If it did, however, then very likely the same cycle would repeat: all of the readers would enter, once, while all the writers queue up.
Then all readers are gone but there are more than one writer at the same time in library.
Again, no. Only one writer is inside at a time, but the others are queued most of the time, so NO_WRITERS_WRITING will almost always be equal to NO_WRITERS.
Bottom line, then: you have confused yourself. You are using variable NO_WRITERS_WRITING primarily to represent the number of writers that are ready to write, but your messaging uses it as if it were the number actually writing. The same does not apply to NO_READERS_READING because once a thread acquires the mutex needed to modify that variable, nothing else prevents it from proceeding on into the room.
One more thing: to make the simulation interesting -- i.e. to keep the writers from taking permanent control -- you should implement a delay, preferably a random one, after each thread leaves the room, before it tries to reenter. And the delay for writers should probably be substantially longer than the delay for readers.

Multithreaded program with mutex on mutual resource [duplicate]

This question already has an answer here:
Pthread_create() incorrect start routine parameter passing
(1 answer)
Closed 3 years ago.
I tried to build a program which should create threads and assign a Print function to each one of them, while the main process should use printf function directly.
Firstly, I made it without any synchronization means and expected to get a randomized output.
Later I tried to add a mutex to the Print function which was assigned to the threads and expected to get a chronological output but it seems like the mutex had no effect about the output.
Should I use a mutex on the printf function in the main process as well?
Thanks in advance
My code:
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
pthread_t threadID[20];
pthread_mutex_t lock;
void* Print(void* _num);
int main(void)
{
int num = 20, indx = 0, k = 0;
if (pthread_mutex_init(&lock, NULL))
{
perror("err pthread_mutex_init\n");
return errno;
}
for (; indx < num; ++indx)
{
if (pthread_create(&threadID[indx], NULL, Print, &indx))
{
perror("err pthread_create\n");
return errno;
}
}
for (; k < num; ++k)
{
printf("%d from main\n", k);
}
indx = 0;
for (; indx < num; ++indx)
{
if (pthread_join(threadID[indx], NULL))
{
perror("err pthread_join\n");
return errno;
}
}
pthread_mutex_destroy(&lock);
return 0;
}
void* Print(void* _indx)
{
pthread_mutex_lock(&lock);
printf("%d from thread\n", *(int*)_indx);
pthread_mutex_unlock(&lock);
return NULL;
}

All questions of program bugs notwithstanding, pthreads mutexes provide only mutual exclusion, not any guarantee of scheduling order. This is typical of mutex implementations. Similarly, pthread_create() only creates and starts threads; it does not make any guarantee about scheduling order, such as would justify an assumption that the threads reach the pthread_mutex_lock() call in the same order that they were created.
Overall, if you want to order thread activities based on some characteristic of the threads, then you have to manage that yourself. You need to maintain a sense of which thread's turn it is, and provide a mechanism sufficient to make a thread notice when it's turn arrives. In some circumstances, with some care, you can do this by using semaphores instead of mutexes. The more general solution, however, is to use a condition variable together with your mutex, and some shared variable that serves as to indicate who's turn it currently is.

The code passes the address of the same local variable to all threads. Meanwhile, this variable gets updated by the main thread.
Instead pass it by value cast to void*.
Fix:
pthread_create(&threadID[indx], NULL, Print, (void*)indx)
// ...
printf("%d from thread\n", (int)_indx);
Now, since there is no data shared between the threads, you can remove that mutex.

All the threads created in the for loop have different value of indx. Because of the operating system scheduler, you can never be sure which thread will run. Therefore, the values printed are in random order depending on the randomness of the scheduler. The second for-loop running in the parent thread will run immediately after creating the child threads. Again, the scheduler decides the order of what thread should run next.
Every OS should have an interrupt (at least the major operating systems have). When running the for-loop in the parent thread, an interrupt might happen and leaves the scheduler to make a decision of which thread to run. Therefore, the numbers being printed in the parent for-loop are printed randomly, because all threads run "concurrently".
Joining a thread means waiting for a thread. If you want to make sure you print all numbers in the parent for loop in chronological order, without letting child thread interrupt it, then relocate the for-loop section to be after the thread joining.

Linux - force single-core execution and debug multi-threading with pthread

I'm debugging a multi-threaded problem with C, pthread and Linux. On my MacOS 10.5.8, C2D, is runs fine, on my Linux computers (2-4 cores) it produces undesired outputs.
I'm not experienced, therefore I attached my code. It's rather simple: each new thread creates two more threads until a maximum is reached. So no big deal... as I thought until a couple of days ago.
Can I force single-core execution to prevent my bugs from occuring?
I profiled the programm execution, instrumenting with Valgrind:
valgrind --tool=drd --read-var-info=yes --trace-mutex=no ./threads
I get a couple of conflicts in the BSS segment - which are caused by my global structs and thread counter variales. However I could mitigate these conflicts with forced signle-core execution because I think the concurrent sheduling of my 2-4 core test-systems are responsible for my errors.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THR 12
#define NEW_THR 2
int wait_time = 0; // log global wait time
int num_threads = 0; // how many threads there are
pthread_t threads[MAX_THR]; // global array to collect threads
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; // sync
struct thread_data
{
int nr; // nr of thread, serves as id
int time; // wait time from rand()
};
struct thread_data thread_data_array[MAX_THR+1];
void
*PrintHello(void *threadarg)
{
if(num_threads < MAX_THR){
// using the argument
pthread_mutex_lock(&mut);
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
// updates
my_data->nr = num_threads;
my_data->time= rand() % 10 + 1;
printf("Hello World! It's me, thread #%d and sleep time is %d!\n",
my_data->nr,
my_data->time);
pthread_mutex_unlock(&mut);
// counter
long t = 0;
for(t = 0; t < NEW_THR; t++){
pthread_mutex_lock(&mut);
num_threads++;
wait_time += my_data->time;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
sleep(1);
}
printf("Bye from %d thread\n", my_data->nr);
pthread_exit(NULL);
}
return 0;
}
int
main (int argc, char *argv[])
{
long t = 0;
// srand(time(NULL));
if(num_threads < MAX_THR){
for(t = 0; t < NEW_THR; t++){
// -> 2 threads entry point
pthread_mutex_lock(&mut);
// rand time
thread_data_array[num_threads].time = rand() % 10 + 1;
// update global wait time variable
wait_time += thread_data_array[num_threads].time;
num_threads++;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
pthread_mutex_lock(&mut);
printf("In main: creating initial thread #%ld\n", t);
pthread_mutex_unlock(&mut);
}
}
for(t = 0; t < MAX_THR; t++){
pthread_join(threads[t], NULL);
}
printf("Bye from program, wait was %d\n", wait_time);
pthread_exit(NULL);
}
I hope that code isn't too bad. I didn't do too much C for a rather long time. :) The problem is:
printf("Bye from %d thread\n", my_data->nr);
my_data->nr sometimes resolves high integer values:
In main: creating initial thread #0
Hello World! It's me, thread #2 and sleep time is 8!
In main: creating initial thread #1
[...]
Hello World! It's me, thread #11 and sleep time is 8!
Bye from 9 thread
Bye from 5 thread
Bye from -1376900240 thread
[...]
I don't now more ways to profile and debug this.
If I debug this, it works - sometimes. Sometimes it doesn't :(
Thanks for reading this long question. :) I hope I didn't share too much of my currently unresolveable confusion.

Since this program seems to be just an exercise in using threads, with no actual goal, it is difficult to suggest how treat your problem rather than treat the symptom. I believe can actually pin a process or thread to a processor in Linux, but doing so for all threads removes most of the benefit of using threads, and I don't actually remember how to do it. Instead I'm going to talk about some things wrong with your program.
C compilers often make a lot of assumptions when they are doing optimizations. One of the assumptions is that unless the current code being examined looks like it might change some variable that variable does not change (this is a very rough approximation to this, and a more accurate explanation would take a very long time).
In this program you have variables which are shared and changed by different threads. If a variable is only read by threads (either const or effectively const after threads that look at it are created) then you don't have much to worry about (and in "read by threads" I'm including the main original thread) because since the variable doesn't change if the compiler only generates code to read that variable once (remembering it in a local temporary variable) or if it generates code to read it over and over the value is always the same so that calculations based on it always come out the same.
To force the compiler not do this you can use the volatile keyword. It is affixed to variable declarations just like the const keyword, and tells the compiler that the value of that variable can change at any instant, so reread it every time its value is needed, and rewrite it every time a new value for it is assigned.
NOTE that for pthread_mutex_t (and similar) variables you do not need volatile. It if were needed on the type(s) that make up pthread_mutex_t on your system volatile would have been used within the definition of pthread_mutex_t. Additionally the functions that access this type take the address of it and are specially written to do the right thing.
I'm sure now you are thinking that you know how to fix your program, but it is not that simple. You are doing math on a shared variable. Doing math on a variable using code like:
x = x + 1;
requires that you know the old value to generate the new value. If x is global then you have to conceptually load x into a register, add 1 to that register, and then store that value back into x. On a RISC processor you actually have to do all 3 of those instructions, and being 3 instructions I'm sure you can see how another thread accessing the same variable at nearly the same time could end up storing a new value in x just after we have read our value -- making our value old, so our calculation and the value we store will be wrong.
If you know any x86 assembly then you probably know that it has instructions that can do math on values in RAM (both getting from and storing the result in the same location in RAM all in one instruction). You might think that this instruction could be used for this operation on x86 systems, and you would almost be right. The problem is that this instruction is still executed in the steps that the RISC instruction would be executed in, and there are several opportunities for another processor to change this variable at the same time as we are doing our math on it. To get around this on x86 there is a lock prefix that may be applied to some x86 instructions, and I believe that glibc header files include atomic macro functions to do this on architectures that can support it, but this can't be done on all architectures.
To work right on all architectures you are going to need to:
int local_thread_count;
int create_a_thread;
pthread_mutex_lock(&count_lock);
local_thread_count = num_threads;
if (local_thread_count < MAX_THR) {
num_threads = local_thread_count + 1;
pthread_mutex_unlock(&count_lock);
thread_data_array[local_thread_count].nr = local_thread_count;
/* moved this into the creator
* since getting it in the
* child will likely get the
* wrong value. */
pthread_create(&threads[local_thread_count], NULL, PrintHello,
&thread_data_array[local_thread_count]);
} else {
pthread_mutex_unlock(&count_lock);
}
Now, since you would have changed the num_threads to volatile you can atomically test and increment the thread count in all threads. At the end of this local_thread_count should be usable as an index into the array of threads. Note that I did not create but 1 thread in this code, while yours was supposed to create several. I did this to make the example more clear, but it should not be too difficult to change it to go ahead and add NEW_THR to num_threads, but if NEW_THR is 2 and MAX_THR - num_threads is 1 (somehow) then you have to handle that correctly somehow.
Now, all of that being said, there may be another way to accomplish similar things by using semaphores. Semaphores are like mutexes, but they have a count associated with them. You would not get a value to use as the index into the array of threads (the function to read a semaphore count won't really give you this), but I thought that it deserved to be mentioned since it is very similar.
man 3 semaphore.h
will tell you a little bit about it.

num_threads should at least be marked volatile, and preferably marked atomic too (although I believe that the int is practically fine), so that at least there is a higher chance that the different threads are seeing the same values. You might want to view the assembler output to see when the writes of num_thread to memory are actually supposedly taking place.

https://computing.llnl.gov/tutorials/pthreads/#PassingArguments
that seems to be the problem. you need to malloc the thread_data struct.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight