How to implement and profile pthreads continuously created on the heap? - c

Introduction
I have a program where child threads are created that I would like to profile with Valgrind memcheck. From the responses to a previous question I've asked, I will need to use joinable (rather than detached) threads in order to test and profile reliably with Valgrind memcheck.
Stack vs. Heap Allocation
My program is sufficiently large where I don't think I can create the thread and join it in the same scope. For this reason I allocate space for the pthread_t on the heap.
Attempt #1 - Joining Immediately
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <stdint.h>
void my_thread() {
printf("I'm in a child thread!\n");
pthread_exit(NULL);
}
pthread_t* const make_thread() {
pthread* const thread = malloc(sizeof(pthread_t));
pthread_create(thread, NULL, (void*) &my_thread, NULL));
return thread;
}
int main() {
printf("Hello, world!\n");
uint8_t i;
for(i = 0; i < 255; ++i) {
pthread_t* const thread_handle = make_thread();
pthread_join(*thread_handle, NULL);
free(thread_handle);
}
return 0;
}
This seems to make sense, but now I want to extend this example by not joining the thread immediately, and only joining on program exit (say, because these threads may become long-living). IOW the above example kind of defeats the purpose of multithreading.
I want to create threads and only really ever force a join on program exit.
Attempt #2 - Joining at the end
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <stdint.h>
#include <glib-2.0/glib.h>
#include <unistd.h>
void my_thread() {
sleep(3);
printf("I'm in a child thread!\n");
pthread_exit(NULL);
}
pthread_t* const make_thread() {
pthread* const thread = malloc(sizeof(pthread_t));
pthread_create(thread, NULL, (void*) &my_thread, NULL));
return thread;
}
int main() {
printf("Hello, world!\n");
GArray* const thread_handles = g_array_new(TRUE, TRUE, sizeof(pthread*));
// Important loop
uint8_t i;
for(i = 0; i < 255; ++i) {
pthread_t* const thread_handle = make_thread();
g_array_append_val(thread_handles, thread_handle);
}
for(i = 0; i < thread_handles->len; ++i) {
pthread_t* const thread_handle =
g_array_index(thread_handles, pthread*, i);
pthread_join(*thread_handle, NULL);
free(thread_handle);
}
g_array_free(thread_handles, TRUE);
return 0;
}
This is cool but what if "Important loop" is actually endless? How can I prevent thread_handles from expanding until it takes up all available memory?
In the actual program (these are just minimal examples), the program receives network messages and then kicks off threads for some special types on network messages.

So, what is your real issue?
For normal network servers, etc, the usual is to combine both of your approaches.
The main thread has a two nested loops that:
waits for a connection/message.
Creates a thread.
Adds this to the list of active threads.
Loops on all active threads in the list (as below)
Loop on the active thread list:
look for a thread that is marked "done"
Remove it from the list
join it
free the thread struct
The above works pretty much the same if the thread control structs are allocated (via malloc) or come from a fixed, pre-defined array of structs [which can be function scoped to main or global/static scope].
Here's some C-like pseudo code to illustrate:
// task control
typedef struct tsk {
pthread_t tsk_tid; // thread id
int tsk_sock; // socket/message struct/whatever
int tsk_isdone; // 1=done
} tsk_t;
void
my_thread(tsk_t *tsk)
{
// do stuff ...
// tell main we're done
tsk->tsk_isdone = 1;
return (void *) 0;
}
void
main_loop(void)
{
while (1) {
// wait for connection, message, whatever ...
int sock = accept();
// create thread to handle request
tsk_t *tsk = make_thread(sock);
// enqueue it to list of active threads
list_enqueue(active_list,tsk);
// join all completed threads
while (1) {
int doneflg = 0;
// look for completed threads
for_all(tsk,active_list) {
if (! tsk->tsk_isdone)
continue;
// remove task from queue/list
list_remove(active_list,tsk);
// join the thread
pthread_join(tsk->tsk_tid,NULL)
// release storage
free(tsk);
// say we reaped/joined at least one thread
doneflg = 1;
}
// stop when we've joined as many threads as we can
if (! doneflg)
break;
}
}
}
Note that while creating a new thread for a new connection may be reasonable, doing so for messages from a given connection can be very slow.
It may be better to have a pool of worker threads. See my answer: Relative merits between one thread per client and queuing thread models for a threaded server?

Related

How do I notify a thread that new data is available using pthreads?

I have new data appearing over a bus. I want my main thread to "wake up" when the new data arrives. My original version of the code is this:
#include <time.h>
#include <stdio.h>
#include <pthread.h>
#include <time.h>
int data = 0;
void* thread_func(void* args)
{
while(1)
{
sleep(2);
data = random() % 5;
}
return NULL;
}
int main()
{
int tid;
pthread_create(&tid, NULL, &thread_func, NULL);
while(1)
{
// Check data.
printf("New data arrived: %d.\n", data);
sleep(2);
}
return 0;
}
But clearly an infinite while loop in the main thread is overkill. So I thought how about this?
#include <time.h>
#include <stdio.h>
#include <pthread.h>
#include <time.h>
int data = 0;
pthread_mutex_t mtx;
void* thread_func(void* args)
{
while(1)
{
sleep(2);
// Data has appeared and can be read by main().
data = random() % 5;
pthread_mutex_unlock(&mtx);
}
return NULL;
}
int main()
{
int tid;
pthread_mutex_init(&mtx, NULL);
pthread_create(&tid, NULL, &thread_func, NULL);
while(1)
{
pthread_mutex_lock(&mtx);
printf("New data has arrived: %d.\n", data);
}
return 0;
}
This works, but is it the best way?
In actual fact, I don't just have a main thread, but several threads that I would like to be asleep until new data for them arrived. This would involve using one mutex lock for each thread. Is this the best way to do things?
I hope it's clear. Thanks.
You can use pthread_cond_wait to wait for a change on the data you share between your threads. This function automatically blocks your mutex and you have to release it afterwards. To notify your threads that the data is ready use the pthread_cond_signal function.
But be careful, you must always lock and unlock your mutex in each of your threads, not as you do in your example.

C/Linux: Alternating between threads

So I have this code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <semaphore.h>
#define nr_threads 3
sem_t semaphores[nr_threads];
typedef struct {
int id;
char *word;
}th_struct;
void *thread_function(void *arg)
{
th_struct *th_data = (th_struct *) arg;
sem_wait(&semaphores[th_data->id]);
printf("[thread#%d] %s\n", th_data->id, th_data->word);
sem_post(&semaphores[th_data->id + 1]);
return NULL;
}
int main(int argc, char **argv)
{
pthread_t tid[nr_threads];
th_struct th_data[nr_threads];
for(int i = 0; i < nr_threads; i++){
if (sem_init(&semaphores[i], 0, 1) != 0){
perror("Could not init semaphore");
return -1;
}
}
sem_post(&semaphores[0]);
for(int i = 0; i < nr_threads; i++){
th_data[i].id = i;
th_data[i].word = argv[i + 1];
pthread_create(&tid[i], NULL, thread_function, &th_data[i]);
}
for(int i = 0; i < nr_threads; i++){
pthread_join(tid[i], NULL);
}
for(int i = 0; i < nr_threads; i++)
sem_destroy(&semaphores[i]);
return 0;
}
I give from the command line 3 words, for example "one two three", and each thread prints one word, synchronized, so that the order will be always correct. I'm new to threads and semaphores, and my brain is currently used to sem_wait(sem) and after sem_post(sem), where sem is the same semaphore. What I'm asking is a complete explanation on why this code works and how it works. Why the semaphores are initialized with 0 permissions? Why there is sem_post(first_semaphore)? I'm very confused.
First of all, there's a bug in that code...
After it has done its job, each thread unconditionally calls sem_post() on the semaphore of the next thread. Therefore, the third thread will try to access semaphores[3] which doesn't exist.
Now what's going on (assuming the bug wasn't there) is this:
3 semaphores are created and initialized so that they are locked immediately
3 threads are created, each calling sem_wait() and blocking (because the semaphores are initialized to 0)
After a thread has done it's job, it calls sem_post() on the semaphore of the next one, which then returns from sem_wait()
This is the basic idea, but to get it running, someone needs to call sem_post() for the first semaphore. So that's why there is that sem_post(&semaphores[0]) in main().
Note: This is more of a long comment, not a complete answer.
I like to think of a semaphore as a blocking queue of informationless tokens. The semaphore's count is the number of tokens in the queue.
From that viewpoint, the main thread in your program creates a single token (from nothing, because the token is nothing), and it hands the token to the first worker thread by calling sem_post(&semaphores[0]);.
The first worker is able to do its job after taking the token from its input queue (i.e., when sem_wait(&semaphores[th_data->id]); returns. And after it has finished its work, it hands the token to the next thread: sem_post(&semaphores[th_data->id + 1]);

Pthread signalling in C Linux

I am working with multi-threading in Linux using Pthread.
Thread1 waits for an IRQ from Driver by polling a character device file (my driver has ISR to catch IRQ from HW).
IRQ -----> Thread1 |-----> Thread2
|-----> Thread3
|-----> Thread4
Whenever Thread1 gets an IRQ, I want send a signal to Thread2, Thread3 and Thread4 to wake them up and then work.
Now, I am trying to use "pthread conditional variable" and "pthread mutex". But it seems that is not good approach.
What is efficient way for synchronization in this case? Please help.
Thank you very much.
As I understand it, your problem is that your child threads (Threads 2 through 4) don't always wake up exactly once for every IRQ that Thread1 receives -- in particular, it might be that an IRQ is received while the child threads are already awake and working on an earlier IRQ, and that causes them not to be awoken for the new IRQ.
If that's correct, then I think a simple solution is to use a counting semaphore for each child-thread, rather than a condition variable. A semaphore is a simple data structure that maintains an integer counter, and supplies two operations, wait/P and signal/V. wait/P decrements the counter, and if the counter's new value is negative, it blocks until the counter has become non-negative again. signal/V increments the counter, and in the case where the counter was negative before the increment, awakens a waiting thread (if one was blocked inside wait/P).
The effect of this is that in the case where your main thread gets multiple IRQs in quick succession, the semaphore will "remember" the multiple signal/V calls (as a positive integer value of the counter), and allow the worker-thread to call wait/P that-many times in the future without blocking. That way no signals are ever "forgotten".
Linux supplies a semaphore API (via sem_init(), etc), but it's designed for inter-process synchronization and is therefore a little bit heavy-weight for synchronizing threads within a single process. Fortunately, it's easy to implement your own semaphore using a pthreads mutex and condition-variable, as shown below.
Note that in this toy example, the main() thread is playing the part of Thread1, and it will pretend to have received an IRQ every time you press return in the terminal window. The child threads are playing the part of Threads2-4, and they will pretend to do one second's worth of "work" every time Thread1 signals them. In particular note that if you press return multiple times in quick succession, the child threads will always do that many "work units", even though they can only perform one work-unit per second.
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
struct example_semaphore
{
pthread_cond_t cond;
pthread_mutex_t mutex;
int count; // acccess to this is serialized by locking (mutex)
};
// Initializes the example_semaphore (to be called at startup)
void Init_example_semaphore(struct example_semaphore * s)
{
s->count = 0;
pthread_mutex_init(&s->mutex, NULL);
pthread_cond_init(&s->cond, NULL);
}
// V: Increments the example_semaphore's count by 1. If the pre-increment
// value was negative, wakes a process that was waiting on the
// example_semaphore
void Signal_example_semaphore(struct example_semaphore * s)
{
pthread_mutex_lock(&s->mutex);
if (s->count++ < 0) pthread_cond_signal(&s->cond);
pthread_mutex_unlock(&s->mutex);
}
// P: Decrements the example_semaphore's count by 1. If the new value of the
// example_semaphore is negative, blocks the caller until another thread calls
// Signal_example_semaphore()
void Wait_example_semaphore(struct example_semaphore * s)
{
pthread_mutex_lock(&s->mutex);
while(--s->count < 0)
{
pthread_cond_wait(&s->cond, &s->mutex);
if (s->count >= 0) break;
}
pthread_mutex_unlock(&s->mutex);
}
// This is the function that the worker-threads run
void * WorkerThreadFunc(void * arg)
{
int workUnit = 0;
struct example_semaphore * my_semaphore = (struct example_semaphore *) arg;
while(1)
{
Wait_example_semaphore(my_semaphore); // wait here until it's time to work
printf("Thread %p: just woke up and is working on work-unit #%i...\n", my_semaphore, workUnit++);
sleep(1); // actual work would happen here in a real program
}
}
static const int NUM_THREADS = 3;
int main(int argc, char ** argv)
{
struct example_semaphore semaphores[NUM_THREADS];
pthread_t worker_threads[NUM_THREADS];
// Setup semaphores and spawn worker threads
int i = 0;
for (i=0; i<NUM_THREADS; i++)
{
Init_example_semaphore(&semaphores[i]);
pthread_create(&worker_threads[i], NULL, WorkerThreadFunc, &semaphores[i]);
}
// Now we'll pretend to be receiving IRQs. We'll pretent to
// get one IRQ each time you press return.
while(1)
{
char buf[128];
fgets(buf, sizeof(buf), stdin);
printf("Main thread got IRQ, signalling child threads now!\n");
for (i=0; i<NUM_THREADS; i++) Signal_example_semaphore(&semaphores[i]);
}
}
I like jeremy's answer, but it does have some lacking in that the interrupt dispatcher needs to know how many semaphores to increment on each interrupt.
Also each increment is potentially a kernel call, so you have a lot of kernel calls for each interrupt.
An alternate is to understand how pthread_cond_broadcast() works. I have put an example below:
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#ifndef NTHREAD
#define NTHREAD 5
#endif
pthread_mutex_t Lock;
pthread_cond_t CV;
int GlobalCount;
int Done;
#define X(y) do { if (y == -1) abort(); } while (0)
void *handler(void *x) {
unsigned icount;
X(pthread_mutex_lock(&Lock));
icount = 0;
while (!Done) {
if (icount < GlobalCount) {
X(pthread_mutex_unlock(&Lock));
icount++;
X(pthread_mutex_lock(&Lock));
} else {
X(pthread_cond_wait(&CV, &Lock));
}
}
X(pthread_mutex_unlock(&Lock));
return NULL;
}
int
main()
{
X(pthread_mutex_init(&Lock, NULL));
X(pthread_cond_init(&CV, NULL));
pthread_t id[NTHREAD];
int i;
for (i = 0; i < NTHREAD; i++) {
X(pthread_create(id+i, NULL, handler, NULL));
}
int c;
while ((c = getchar()) != EOF) {
X(pthread_mutex_lock(&Lock));
GlobalCount++;
X(pthread_mutex_unlock(&Lock));
X(pthread_cond_broadcast(&CV));
}
X(pthread_mutex_lock(&Lock));
Done = 1;
X(pthread_cond_broadcast(&CV));
X(pthread_mutex_unlock(&Lock));
for (i = 0; i < NTHREAD; i++) {
X(pthread_join(id[i], NULL));
}
return 0;
}

Initialize and Deinitialize only once with multiple threads without mutexes

I have two functions initialize() and deinitialize() and each function should run only once. The structure is something similar to:
int *x;
initialize()
{
x = malloc(sizeof(int) * 10);
}
deinitialize()
{
free(x);
}
How can I ensure that only the first thread calls initialize and only the last thread calls deinitialize.
Can this be achieve without the use of mutex?
UPDATE:
Sorry for the poor information. I am actually modifying a library that contains the functions initialize() and deinitialize(). I need to make these two functions thread-safe. The users might use multiple threads and might call these functions more than once. I cannot assume that the users will call exactly once to initialize and deinitialize functions. I can only assume is that if a thread calls initialize, it will call deinitialize at some point.
The users will be using pthread library to create their different threads.
I don't know what you want to achieve, the most easiest is:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
int *x;
int running;
void initialization(void)
{
x = malloc(sizeof(int) * 10);
puts("First thread init x");
}
void deinitialization(void)
{
free(x);
puts("Last thread reset x");
}
void *handler(void *data)
{
printf("Thread started\n");
while (running) {
do_work();
}
printf("Thread exit\n");
}
int main(void)
{
pthread_t threads[3];
initialization();
for (int i = 0; i < 3; i++)
pthread_create(&threads[i], NULL, &handler, NULL);
sleep(2);
running = 0;
for (int i = 0; i < 3; i++)
pthread_join(threads[i], NULL);
deinitialization();
return 0;
}
Here you can be sure that your have called init() and deinit() only once.
UPDATE
Another variant little bit complicated, but here you also can be sure that init() called only once. Thread with id 0 can be start after 1 in this case we should wait while *x is NULL.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#define MAX_THREADS 3
int *x;
int running;
int init;
struct thread_info
{
pthread_t thread;
int id;
int first_id;
int last_id;
};
void initialization(void)
{
x = malloc(sizeof(int) * 10);
puts("First thread init x");
init = 1;
}
void deinitialization(void)
{
free(x);
puts("Last thread reset x");
}
void *handler(void *data)
{
struct thread_info *tinfo = data;
printf("Thread started\n");
if (tinfo->id == 0)
initialization();
while (!init);
/* EMPTY BODY */
while (running) {
do_work();
}
printf("Thread exit\n");
}
int main(void)
{
struct thread_info threads[MAX_THREADS] =
{
[0 ... 2].id = -1,
[0 ... 2].first_id = 0,
[0 ... 2].last_id = (MAX_THREADS - 1)
};
for (int i = 0; i < 3; i++)
pthread_create(&threads[i].thread, NULL, &handler,
((threads[i].id = i), &(threads[i])));
sleep(2);
running = 0;
for (int i = 0; i < 3; i++)
pthread_join(threads[i].thread, NULL);
deinitialization();
return 0;
}
deinit() is a little bit tricky because your last thread can be exiting and free(x) while another thread is still running and maybe use it. So I leave it after all threads is exiting.
This is the point about concurrent programming. You can never make any assumptions about the order in which the threads execute.
The exact timing of when tasks in a concurrent system are executed depend on the scheduling, and tasks need not always be executed concurrently. For example, given two tasks, T1 and T2:
T1 may be executed and finished before T2 or vice versa (serial and
sequential)
T1 and T2 may be executed alternately (serial and concurrent)
T1 and T2 may be executed simultaneously at the same instant of time
(parallel and concurrent)

C pthread allow only four threads to execute function

Here is a problem, say I need to execute a function x times which does some taks, but only four threads can be executing it at any given time. So thread A,B,C,D can start task 0,1,2,3 respectively. However, task four can't start until one of the threads completed, so say if thread A completes, then the next task can be executed by one of the free threads. This should repeat x times, where x is the number of times the function needs to be called.
So I've used semaphores and join the pthread after it completes to ensure it completes. However, sometimes the main function finishes executing before some of the threads complete, and valgrind is complaining that my pthread_create is leaking memory. I think the way I'm doing is incorrect or is a naive approach, so any guidance or example code to fix this will be most appreciated! Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
#include <semaphore.h>
sem_t s;
typedef struct Data Data;
struct Data {
pthread_t* a;
int index;
int j;
};
void* someFunction(void* arg){
/* Only at most num_threads should be here at once; */
sem_wait(&s);
Data* d = arg;
printf("Successfully completed task %d with thread %d\n", d->index, d->j);
sleep(2);
pthread_t* z = d->a;
free(d);
pthread_join(*z, NULL);
sem_post(&s);
return 0;
}
int main(void){
int num_task = 15; // i need to call someFunction() 9000 times
int num_threads = 4;
int j = 0;
sem_init(&s, 0, num_threads);
pthread_t thread_ids[num_threads];
for (int i = 0; i < num_task; i ++){
/*NEED TO COMPLETE num_tasks using four threads;
4 threads can run someFunction() at the same time; so one all four are currently executing someFunction(), other threads can't enter until one has completed. */
if (j == num_threads){
j = 0; // j goes 0 1 2 3 0 1 2 3 ...
}
Data* a = malloc(sizeof(Data));
a->a = thread_ids + j;
a->index = i;
a->j = j;
sem_wait(&s);
pthread_create(thread_ids + j, NULL, someFunction, a);
sem_post(&s);
j ++;
}
return 0;
}
Thank you so much
Having threads wait for each other usually gets messy quickly, and you're likely to end up in situations where a thread tries to join itself, or is never joined.
The most reliable way to have at most four threads running is to only create four threads.
Instead of creating threads as needed, you let each thread (potentially) perform more than one task.
You can separate the "task" concept from the "thread" concept:
Make a queue of tasks for the threads to perform.
Create four threads.
Each thread takes a task from the queue and performs it, repeating until the queue is empty.
Wait for the threads to finish in main.
The only thing that needs synchronising is the removal of a task from the queue, which is very simple.
(If the tasks are not independent, you need more complex plumbing.)
Pseudocode (I have invented some names as I'm not overly familiar with pthreads):
typedef struct Task
{
/* whatever */
};
/* Very simplistic queue structure. */
typedef struct Queue
{
mutex lock;
int head;
Task tasks[num_tasks];
};
/* Return front of queue; NULL if empty. */
Task* dequeue(Queue* q)
{
Task* t = NULL;
lock_mutex(q->lock);
if (q->head < num_tasks)
{
t = &q->tasks[q->head];
q->head++;
}
unlock_mutex(q->lock);
return t;
}
/* The thread function is completely unaware of any multithreading
and can be used in a single-threaded program while debugging. */
void* process(void* arg)
{
Queue* queue = (Queue*) arg;
for (;;)
{
Task* t = dequeue(queue);
if (!t)
{
/* Done. */
return NULL;
}
/* Perform task t */
}
}
/* main is very simple - set up tasks, launch threads, wait for threads.
No signalling, no memory allocation. */
int main(void)
{
pthread threads[num_threads];
Queue q;
q.head = 0;
/* Fill in q.tasks... */
/* Initialise q.lock... */
for (int ti = 0; ti < num_threads; ti++)
{
pthread_create(threads + ti, NULL, process, &q);
}
for (int ti = 0; ti < num_threads; ti++)
{
/* join the thread */
}
return 0;
}
Your code starts four threads at a time and waits until they are finished. However, your main loop only creates the threads, it does not for them to exit.
After you created a thread your OS will schedul it whenever it wants.
That means you have to join the last four threads you created after your for loop. So they have a chance to finish their work and free their memory.
Regards

Resources