I'm running a highly threaded application (500+ threads). I need to trace some data from them, and to do so I was printing from the thread. The output is only cut off it seems. I've also made sure to flush stdout often and I've also tried using a mutex to coordinate output. None of those solutions have worked.
This is the thread in question:
void* troutine(void* tmp) {
a = RDTSC();
chance = Park(state);
b = RDTSC();
printf("%s.%i.%c : %lli\n", IMPLEMENTATION, *(int*)tmp, 'T', b-a);
usleep(RAND(50));
a = RDTSC();
Leave(chance, state);
b = RDTSC();
printf("%s.%i.%c : %lli\n", IMPLEMENTATION, *(int*)tmp, 'T', b-a);
fflush(stdout);
pthread_exit(NULL);
}
Only about half the print statements actually print, which is the problem. I need to make sure they all print, the order doesn't matter, and none of the output is interweaved.
EDIT main.c
for(i = 0; i < 4000; i++)
while(!pthread_create(&tmp, NULL, &troutine, (void*)&testNum));
The while loop is so that I ensure the creation of 4k threads as sometimes pthread_create fails with so many threads active. Also, even when I only set the loop to make i < 4 threads, I still get ~300 lines of output (as opposed to 8).
(1) You will fall out of main() before your threads finish. Either join the threads or put a pthread_exit() in main() so it doesn't kill your running threads when it exits.
for(i = 0; i < 4000; i++)
while(!pthread_create(&tmp, NULL, &troutine, (void*)&testNum));
(2) Pthread_create returns 0 on success. So the above while loop is saying "while successful, keep creating threads". That would explain so much output when i is only 4.
Edit 2: Another possibility is that your problem is outside of this code and that something is calling exit (if not crashing) so that half of your threads never finish. It would really help to know more about what you mean by "cut-off".
[As R mentions, this shouldn't be necessary. Only leaving it so the comment thread makes sense.]
When you say you're using a lock, are you using some kind of global mutex like:
pthread_mutex_lock(mutex);
printf("%s.%i.%c : %lli\n", IMPLEMENTATION, *(int*)tmp, 'T', b-a);
pthread_mutex_unlock(mutex);
because I don't see that in your example. Note that mutex needs to be defined in the above example, and also needs to be a pointer.
Related
I have the following program that spawns two threads to print_something() and they both repeatedly print a specific string: thread 1 prints "Hi\n" and thread 2 prints "Bye\n":
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *print_something(int *k)
{
int n = 100;
int i;
if (*k) {
for (i = 0; i < 100; i++) {
printf("Hi\n");
}
} else {
for (i = 0; i < 100; i++) {
printf("Bye\n");
}
}
}
int main()
{
int x = 1, y = 0;
pthread_t t1, t2;
pthread_create(&t1, NULL, print_something, &x);
pthread_create(&t2, NULL, print_something, &y);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("End of program.\n");
return 0;
}
I expected them to run synchronously wherein the output in the terminal would be random such as:
Hi
Hi
Bye
Hi
Bye
...
But instead I always get thread 1 to finish its printing first before thread 2 will start printing:
Hi
Hi
...
Hi
Hi
Bye
Bye
...
Bye
Bye
End of program.
Why is the first thread blocking the second thread from printing?
Why is the first thread blocking the second thread from printing?
Who says it's blocking? Maybe starting a new thread takes long enough that the first additional thread (running in parallel with the original thread) finishes its printing (to stdout's buffer) before the second additional thread arrives at the point of trying to print anything.
On the other hand, POSIX does specify that the stdio functions perform operations on streams as if there was a lock associated with each stream that a thread must obtain upon entry to the function and releases upon exit. Thus, the first thread may indeed be blocking the second via the lock associated with stdout.
Moreover, when a thread unlocks a lock and then immediately tries to re-acquire the same lock, there is a high probability for that thread to succeed immediately despite other threads contending for the lock. As a result, when an entire loop body starts with acquiring a lock and ends with releasing that lock -- as is the case in your code for the lock associated stdout -- it is common for one thread to be able to monopolize the lock for many loop iterations.
I expected them to run synchronously wherein the output in the terminal would be random such as:
That's an unreasonable expectation. If two people each need to put in a hundred screws and are sharing a screwdriver, do you think they should hand off the screwdriver after each screw? It only makes sense to hand off the screwdriver when the one holding the screwdriver is tired.
Each thread spends the vast majority of its time accessing the console output stream. It can only do this by excluding the other thread. The behavior you expect would be atrocious.
Would they run on the same core? That would require a context switch after every line of output -- the worst performance possible for this code. Would they run on two cores? That would mean each core is waiting for the other core to finish with the console for about half the time -- also horrible performance.
Simply put, you expected your system to find a terrible way to do what you asked it to do. It found a much more efficient way -- letting one thread keep the console, finish what it was doing, and then letting the other go.
This question already has an answer here:
Pthread_create() incorrect start routine parameter passing
(1 answer)
Closed 3 years ago.
I tried to build a program which should create threads and assign a Print function to each one of them, while the main process should use printf function directly.
Firstly, I made it without any synchronization means and expected to get a randomized output.
Later I tried to add a mutex to the Print function which was assigned to the threads and expected to get a chronological output but it seems like the mutex had no effect about the output.
Should I use a mutex on the printf function in the main process as well?
Thanks in advance
My code:
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
pthread_t threadID[20];
pthread_mutex_t lock;
void* Print(void* _num);
int main(void)
{
int num = 20, indx = 0, k = 0;
if (pthread_mutex_init(&lock, NULL))
{
perror("err pthread_mutex_init\n");
return errno;
}
for (; indx < num; ++indx)
{
if (pthread_create(&threadID[indx], NULL, Print, &indx))
{
perror("err pthread_create\n");
return errno;
}
}
for (; k < num; ++k)
{
printf("%d from main\n", k);
}
indx = 0;
for (; indx < num; ++indx)
{
if (pthread_join(threadID[indx], NULL))
{
perror("err pthread_join\n");
return errno;
}
}
pthread_mutex_destroy(&lock);
return 0;
}
void* Print(void* _indx)
{
pthread_mutex_lock(&lock);
printf("%d from thread\n", *(int*)_indx);
pthread_mutex_unlock(&lock);
return NULL;
}
All questions of program bugs notwithstanding, pthreads mutexes provide only mutual exclusion, not any guarantee of scheduling order. This is typical of mutex implementations. Similarly, pthread_create() only creates and starts threads; it does not make any guarantee about scheduling order, such as would justify an assumption that the threads reach the pthread_mutex_lock() call in the same order that they were created.
Overall, if you want to order thread activities based on some characteristic of the threads, then you have to manage that yourself. You need to maintain a sense of which thread's turn it is, and provide a mechanism sufficient to make a thread notice when it's turn arrives. In some circumstances, with some care, you can do this by using semaphores instead of mutexes. The more general solution, however, is to use a condition variable together with your mutex, and some shared variable that serves as to indicate who's turn it currently is.
The code passes the address of the same local variable to all threads. Meanwhile, this variable gets updated by the main thread.
Instead pass it by value cast to void*.
Fix:
pthread_create(&threadID[indx], NULL, Print, (void*)indx)
// ...
printf("%d from thread\n", (int)_indx);
Now, since there is no data shared between the threads, you can remove that mutex.
All the threads created in the for loop have different value of indx. Because of the operating system scheduler, you can never be sure which thread will run. Therefore, the values printed are in random order depending on the randomness of the scheduler. The second for-loop running in the parent thread will run immediately after creating the child threads. Again, the scheduler decides the order of what thread should run next.
Every OS should have an interrupt (at least the major operating systems have). When running the for-loop in the parent thread, an interrupt might happen and leaves the scheduler to make a decision of which thread to run. Therefore, the numbers being printed in the parent for-loop are printed randomly, because all threads run "concurrently".
Joining a thread means waiting for a thread. If you want to make sure you print all numbers in the parent for loop in chronological order, without letting child thread interrupt it, then relocate the for-loop section to be after the thread joining.
I'm learning thread synchronization and this is the demo to show how to lock critical data when a thread is executing:
http://ideone.com/7Do0l
(To run this code, compile it with the -pthread parameter in Linux/MacOS environment)
The program works as expected, but the sleep() function doesn't pause the execution between threads. My idea is to have one thread do the calculation at a time, then 1 second later another thread comes into play. Here is the code segment I'm fighting with:
while(1) {
//sleep(1); //(1) (Sleep for one second)
sem_wait(&mutex);
//sleep(1); //(2)
printf("Thread #%d is doing math. %d + 1 = %d.\n", (int) id, s, s+1);
s++;
//sleep(1); //(3)
sem_post(&mutex);
//sleep(1); //(4)
}
There are four positions I have tried to put the sleep() in. (1) and (4) result in no pauses between single threads but between two bunches of ten threads. (2) and (3) result in one thread gets executed repeatedly for very long time before another gets called.
Is there a remedy to this?
Update
There is a trick to make the program produce the result: generating the sleeping time randomly for each thread, but it's not consistent since two random numbers could be the same by accident.
Put it in the 3rd position, since you want a one second delay between printf messages.
If you want to make sure that all threads are initialized before any of them can enter into the critical section, modify the main function of the linked code to this
int main() {
pthread_t thread[10];
int i;
sem_init(&mutex, 0, 1);
sem_wait(&mutex);
for (i = 0; i<10; ++i)
pthread_create(&(thread[i]), NULL, job, (void*) i);
sem_post(&mutex);
sleep(100);
}
That's not really the kind of problem threads are designed to solve. You'd have to have a separate semaphore for each thread, have one thread loop through those, calling sem_post on a different one each second, and the rest just calling sem_wait. May as well just use the one thread.
I did some research and found that the only way to produce the desired output is the one I mentioned in the Update part. That is, instead of hard coded the sleep timer, just give each thread a random number:
// Sleep time in microseconds.
int st = rand() % 500000;
usleep(st);
And actually I have been over-worrying about two threads doing the same thing at once. Even though the two adjacent random timers could be accidentally the same, two threads never get executed on the same core of the CPU at the same time, in case the CPU is of multiple cores, no two instructions can modify the same memory content concurrently.
I am working on a program with a fixed number of threads in C using posix threads.
How can i be notified when a thread has been terminated due to some error?
Is there a signal to detect it?
If so, can the signal handler create a new thread to keep the number of threads the same?
Make the threads detached
Get them to handle errors gracefully. i.e. Close mutexs, files etc...
Then you will have no probolems.
Perhaps fire a USR1 signal to the main thread to tell it that things have gone pear shaped (i was going to say tits up!)
Create your threads by passing the function pointers to an intermediate function. Start that intermediate function asynchronously and have it synchronously call the passed function. When the function returns or throws an exception, you can handle the results in any way you like.
With the latest inputs you've provided, I suggest you do something like this to get the number of threads a particular process has started-
#include<stdio.h>
#define THRESHOLD 50
int main ()
{
unsigned count = 0;
FILE *a;
a = popen ("ps H `ps -A | grep a.out | awk '{print $1}'` | wc -l", "r");
if (a == NULL)
printf ("Error in executing command\n");
fscanf(a, "%d", &count );
if (count < THRESHOLD)
{
printf("Number of threads = %d\n", count-1);
// count - 1 in order to eliminate header.
// count - 2 if you don't want to include the main thread
/* Take action. May be start a new thread etc */
}
return 0;
}
Notes:
ps H displays all threads.
$1 prints first column where PID is displayed on my system Ubuntu. The column number might change depending on the system
Replace a.out it with your process name
The backticks will evaluate the expression within them and give you the PID of your process. We are taking advantage of the fact that all POSIX threads will have same PID.
I doubt Linux would signal you when a thread dies or exits for any reason. You can do so manually though.
First, let's consider 2 ways for the thread to end:
It terminates itself
It dies
In the first method, the thread itself can tell someone (say the thread manager) that it is being terminated. The thread manager will then spawn another thread.
In the second method, a watchdog thread can keep track of whether the threads are alive or not. This is done more or less like this:
Thread:
while (do stuff)
this_thread->is_alive = true
work
Watchdog:
for all threads t
t->timeout = 0
while (true)
for all threads t
if t->is_alive
t->timeout = 0
t->is_alive = false
else
++t->timeout
if t->timeout > THRESHOLD
Thread has died! Tell the thread manager to respawn it
If for any reason one could not go for Ed Heal's "just work properly"-approach (which is my favorite answer to the OP's question, btw), the lazy fox might take a look at the pthread_cleanup_push() and pthread_cleanup_pop() macros, and think about including the whole thread function's body in between such two macros.
The clean way to know whether a thread is done is to call pthread_join() against that thread.
// int pthread_join(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_join(that_thread_id, &retval);
... here you know that_thread_id returned ...
The problem with pthread_join() is, if the thread never returns (continues to run as expected) then you are blocked. That's therefore not very useful in your case.
However, you may actually check whether you can join (tryjoin) as follow:
//int pthread_tryjoin_np(pthread_t thread, void **retval);
int retval = 0;
int r = pthread_tryjoin_np(that_thread_id, &relval);
// here 'r' tells you whether the thread returned (joined) or not.
if(r == 0)
{
// that_thread_id is done, create new thread here
...
}
else if(errno != EBUSY)
{
// react to "weird" errors... (maybe a perror() at least?)
}
// else -- thread is still running
There is also a timed join which will wait for the amount of time you specified, like a few seconds. Depending on the number of threads to check and if your main process just sits around otherwise, it could be a solution. Block on thread 1 for 5 seconds, then thread 2 for 5 seconds, etc. which would be 5,000 seconds per loop for 1,000 threads (about 85 minutes to go around all threads with the time it takes to manage things...)
There is a sample code in the man page which shows how to use the pthread_timedjoin_np() function. All you would have to do is put a for loop around to check each one of your thread.
struct timespec ts;
int s;
...
if (clock_gettime(CLOCK_REALTIME, &ts) == -1) {
/* Handle error */
}
ts.tv_sec += 5;
s = pthread_timedjoin_np(thread, NULL, &ts);
if (s != 0) {
/* Handle error */
}
If your main process has other things to do, I would suggest you do not use the timed version and just go through all the threads as fast as you can.
I'm debugging a multi-threaded problem with C, pthread and Linux. On my MacOS 10.5.8, C2D, is runs fine, on my Linux computers (2-4 cores) it produces undesired outputs.
I'm not experienced, therefore I attached my code. It's rather simple: each new thread creates two more threads until a maximum is reached. So no big deal... as I thought until a couple of days ago.
Can I force single-core execution to prevent my bugs from occuring?
I profiled the programm execution, instrumenting with Valgrind:
valgrind --tool=drd --read-var-info=yes --trace-mutex=no ./threads
I get a couple of conflicts in the BSS segment - which are caused by my global structs and thread counter variales. However I could mitigate these conflicts with forced signle-core execution because I think the concurrent sheduling of my 2-4 core test-systems are responsible for my errors.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THR 12
#define NEW_THR 2
int wait_time = 0; // log global wait time
int num_threads = 0; // how many threads there are
pthread_t threads[MAX_THR]; // global array to collect threads
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; // sync
struct thread_data
{
int nr; // nr of thread, serves as id
int time; // wait time from rand()
};
struct thread_data thread_data_array[MAX_THR+1];
void
*PrintHello(void *threadarg)
{
if(num_threads < MAX_THR){
// using the argument
pthread_mutex_lock(&mut);
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
// updates
my_data->nr = num_threads;
my_data->time= rand() % 10 + 1;
printf("Hello World! It's me, thread #%d and sleep time is %d!\n",
my_data->nr,
my_data->time);
pthread_mutex_unlock(&mut);
// counter
long t = 0;
for(t = 0; t < NEW_THR; t++){
pthread_mutex_lock(&mut);
num_threads++;
wait_time += my_data->time;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
sleep(1);
}
printf("Bye from %d thread\n", my_data->nr);
pthread_exit(NULL);
}
return 0;
}
int
main (int argc, char *argv[])
{
long t = 0;
// srand(time(NULL));
if(num_threads < MAX_THR){
for(t = 0; t < NEW_THR; t++){
// -> 2 threads entry point
pthread_mutex_lock(&mut);
// rand time
thread_data_array[num_threads].time = rand() % 10 + 1;
// update global wait time variable
wait_time += thread_data_array[num_threads].time;
num_threads++;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
pthread_mutex_lock(&mut);
printf("In main: creating initial thread #%ld\n", t);
pthread_mutex_unlock(&mut);
}
}
for(t = 0; t < MAX_THR; t++){
pthread_join(threads[t], NULL);
}
printf("Bye from program, wait was %d\n", wait_time);
pthread_exit(NULL);
}
I hope that code isn't too bad. I didn't do too much C for a rather long time. :) The problem is:
printf("Bye from %d thread\n", my_data->nr);
my_data->nr sometimes resolves high integer values:
In main: creating initial thread #0
Hello World! It's me, thread #2 and sleep time is 8!
In main: creating initial thread #1
[...]
Hello World! It's me, thread #11 and sleep time is 8!
Bye from 9 thread
Bye from 5 thread
Bye from -1376900240 thread
[...]
I don't now more ways to profile and debug this.
If I debug this, it works - sometimes. Sometimes it doesn't :(
Thanks for reading this long question. :) I hope I didn't share too much of my currently unresolveable confusion.
Since this program seems to be just an exercise in using threads, with no actual goal, it is difficult to suggest how treat your problem rather than treat the symptom. I believe can actually pin a process or thread to a processor in Linux, but doing so for all threads removes most of the benefit of using threads, and I don't actually remember how to do it. Instead I'm going to talk about some things wrong with your program.
C compilers often make a lot of assumptions when they are doing optimizations. One of the assumptions is that unless the current code being examined looks like it might change some variable that variable does not change (this is a very rough approximation to this, and a more accurate explanation would take a very long time).
In this program you have variables which are shared and changed by different threads. If a variable is only read by threads (either const or effectively const after threads that look at it are created) then you don't have much to worry about (and in "read by threads" I'm including the main original thread) because since the variable doesn't change if the compiler only generates code to read that variable once (remembering it in a local temporary variable) or if it generates code to read it over and over the value is always the same so that calculations based on it always come out the same.
To force the compiler not do this you can use the volatile keyword. It is affixed to variable declarations just like the const keyword, and tells the compiler that the value of that variable can change at any instant, so reread it every time its value is needed, and rewrite it every time a new value for it is assigned.
NOTE that for pthread_mutex_t (and similar) variables you do not need volatile. It if were needed on the type(s) that make up pthread_mutex_t on your system volatile would have been used within the definition of pthread_mutex_t. Additionally the functions that access this type take the address of it and are specially written to do the right thing.
I'm sure now you are thinking that you know how to fix your program, but it is not that simple. You are doing math on a shared variable. Doing math on a variable using code like:
x = x + 1;
requires that you know the old value to generate the new value. If x is global then you have to conceptually load x into a register, add 1 to that register, and then store that value back into x. On a RISC processor you actually have to do all 3 of those instructions, and being 3 instructions I'm sure you can see how another thread accessing the same variable at nearly the same time could end up storing a new value in x just after we have read our value -- making our value old, so our calculation and the value we store will be wrong.
If you know any x86 assembly then you probably know that it has instructions that can do math on values in RAM (both getting from and storing the result in the same location in RAM all in one instruction). You might think that this instruction could be used for this operation on x86 systems, and you would almost be right. The problem is that this instruction is still executed in the steps that the RISC instruction would be executed in, and there are several opportunities for another processor to change this variable at the same time as we are doing our math on it. To get around this on x86 there is a lock prefix that may be applied to some x86 instructions, and I believe that glibc header files include atomic macro functions to do this on architectures that can support it, but this can't be done on all architectures.
To work right on all architectures you are going to need to:
int local_thread_count;
int create_a_thread;
pthread_mutex_lock(&count_lock);
local_thread_count = num_threads;
if (local_thread_count < MAX_THR) {
num_threads = local_thread_count + 1;
pthread_mutex_unlock(&count_lock);
thread_data_array[local_thread_count].nr = local_thread_count;
/* moved this into the creator
* since getting it in the
* child will likely get the
* wrong value. */
pthread_create(&threads[local_thread_count], NULL, PrintHello,
&thread_data_array[local_thread_count]);
} else {
pthread_mutex_unlock(&count_lock);
}
Now, since you would have changed the num_threads to volatile you can atomically test and increment the thread count in all threads. At the end of this local_thread_count should be usable as an index into the array of threads. Note that I did not create but 1 thread in this code, while yours was supposed to create several. I did this to make the example more clear, but it should not be too difficult to change it to go ahead and add NEW_THR to num_threads, but if NEW_THR is 2 and MAX_THR - num_threads is 1 (somehow) then you have to handle that correctly somehow.
Now, all of that being said, there may be another way to accomplish similar things by using semaphores. Semaphores are like mutexes, but they have a count associated with them. You would not get a value to use as the index into the array of threads (the function to read a semaphore count won't really give you this), but I thought that it deserved to be mentioned since it is very similar.
man 3 semaphore.h
will tell you a little bit about it.
num_threads should at least be marked volatile, and preferably marked atomic too (although I believe that the int is practically fine), so that at least there is a higher chance that the different threads are seeing the same values. You might want to view the assembler output to see when the writes of num_thread to memory are actually supposedly taking place.
https://computing.llnl.gov/tutorials/pthreads/#PassingArguments
that seems to be the problem. you need to malloc the thread_data struct.