Linux kernel thread created inside exception handler stalls - c

I am trying to create a Kthread within the body of the __do_page_fault exception handler. This thread will do some work (print the pc of the user process that caused the exception, and trace it using ptrace).
Currently I am just printing the pc in a loop that does not return, but the system hangs after a while, as if the Kthread should return in order for the user process to continue execution.
This is the code of the Kthread:
int th1( void * data ){
struct task_struct * tsk= (struct task_struct*) data;
int pid = tsk->pid;
int id= current->pid;
printk("thread %d is tracking %d ",id, tsk->pid);
while(true){
mdelay(100000);
printk("%d, %ud\n",id,task_pt_regs(tsk)->ARM_pc);
}
}
And this where it is created:
if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP | VM_FAULT_BADACCESS)))){
printk("the page fault was made by the procces id %d\n",tsk->pid);
if(tsk->is_tracked == 0) {
tsk->is_tracked=1 ;
struct task_struct * child_debugger;
void * data = (void *) tsk;
child_debugger = kthread_create(th1,data, "os2");
if(child_debugger) {
wake_up_process(child_debugger);
}else{
printk("error\n");
}
printk("thread went out \n");
}
return 0 ;
}

Threads shouldn't be created in interrupt context (like in exception handlers).
For defer work from an exception handler you may use pre-created thread, or a workqueue.

Related

Error 11 when using pthread_create in TCP server

I have a C app that listens to a TCP port and creates a new thread each time it accepts a new connection. It works OK initially, but after a while I start getting error code 11 from pthread_create.
There are no thread-related function calls inside the body of the thread function, and the log shows that there is a matching 'out' for every 'in'.
When it fails, I call the thread function that directly, and it works fine indefinitely on the main thread, so it seems unlikely that I am using up resources within the function. Any suggestions on what causes the error 11, and how to fix it?
This is the thread function:
void * tcp_process_message (void * arg) {
MESSAGE_BUFFER * bp = (MESSAGE_BUFFER *) arg;
USER_LOG (UL_INFO, "tpm in %d", bp - buffers);
...
USER_LOG (UL_INFO, "tpm out %d", bp - buffers);
}
This is the section that creates threads: there is no other code that interacts with the new thread once it is created.
while(!cancel){
connfd = accept(listenfd, (struct sockaddr *) &from_addr, &fromsize);
if (!cancel) {
MESSAGE_BUFFER * bp = allocate_message ();
if (bp == NULL) {
USER_LOG (UL_ERROR, "%s", "allocate_message failed");
close(connfd);
}
else {
bp->connfd = connfd;
strcpy (bp->ip_addr, inet_ntoa(from_addr.sin_addr));
int err = pthread_create (&tid, NULL, &tcp_process_message, (void *) bp);
if (err) {
USER_LOG (UL_ERROR, "thread create failed (%d)", err);
tcp_process_message ((void *) bp);
}
}
}
}
When creating thread, the thread resource still exist even after the thread function exits.
Some piece of code must either wait for the thread using pthread_join(the_thread); or, it must detach the thread and let it die once its function exits: pthread_detach(the_thread);.
If not done, the thread remain in the system and the system will soon run out of resources and won't be able to create new threads anymore.

How do I stop threads stalling on pthread_join?

I've got a project where I'm adding jobs to a queue and I have multiple threads taking jobs, and calculating their own independent results.
My program handles the SIGINT signal and I'm attempting to join the threads to add up the results, print to screen, and then exit. My problem is is that the threads either seem to stop functioning when I send the signal, or they're getting blocked on the mutex_lock. Here are the important parts of my program so as to be concise.
main.c
//the thread pool has a queue of jobs inside
//called jobs (which is a struct)
struct thread_pool * pool;
void signal_handler(int signo) {
pool->jobs->running = 0; //stop the thread pool
pthread_cond_broadcast(pool->jobs->cond);
for (i = 0; i < tpool->thread_count; i++) {
pthread_join(tpool->threads[i], retval);
//do stuff with retval
}
//print results then exit
exit(EXIT_SUCCESS);
}
int main() {
signal(SIGINT, signal_handler);
//set up threadpool and jobpool
//start threads (they all run the workerThread function)
while (1) {
//send jobs to the job pool
}
return 0;
}
thread_stuff.c
void add_job(struct jobs * j) {
if (j->running) {
pthread_mutex_lock(j->mutex);
//add job to queue and update count and empty
pthread_cond_signal(j->cond);
pthread_mutex_unlock(j->mutex);
}
}
struct job * get_job(struct jobs * j) {
pthread_mutex_lock(j->mutex);
while (j->running && j->empty)
pthread_cond_wait(j->cond, j->mutex);
if (!j->running || j->empty) return NULL;
//get the next job from the queue
//unlock mutex and send a signal to other threads
//waiting on the condition
pthread_cond_signal(j->cond);
pthread_mutex_unlock(j->mutex);
//return new job
}
void * workerThread(void * arg) {
struct jobs * j = (struct jobs *) arg;
int results = 0;
while (j->running) {
//get next job and process results
}
return results;
}
Thanks for your help, this is giving me a real headache!
You should not call pthread_cond_wait or pthread_join from a signal handler which handles asynchronously generated signals such as SIGINT. Instead, you should block SIGINT for all threads, spawn a dedicated thread, and call sigwait there. This means that you detect the arrival of the SIGINT signal outside of a signal handler context, so that you are not restricted to async-signal-safe functions. You also avoid the risk of self-deadlock in case the signal is delivered to one of the worker threads.
At this point, you just need to shut down your work queue/thread pool in an orderly manner. Depending on the details, your existing approach with the running flag might even work unchanged.

What's a good strategy for clean/reliable shutdown of threads that use pthread barriers for synchronization?

I've got a pthread-based multithreaded program that has four threads indefinitely executing this run-loop (pseudocode):
while(keepRunning)
{
pthread_barrier_wait(&g_stage_one_barrier);
UpdateThisThreadsStateVariables();
pthread_barrier_wait(&g_stage_two_barrier);
DoComputationsThatReadFromAllThreadsStateVariables();
}
This works pretty well, in that during stage one each thread updates its own state variables, and that's okay because no other thread is reading any other thread's state variables during stage one. Then during stage two it's a free-for-all as far as threads reading each others' state is concerned, but that's okay because during stage two no thread is modifying its local state variables, so they are effectively read-only.
My only remaining problem is, how do I cleanly and reliably shut down these threads when it's time for my application to quit? (By "cleanly and reliably", I mean without introducing potential deadlocks or race conditions, and ideally without having to send any UNIX-signals to force threads out of a pthread_barrier_wait() call)
My main() thread can of course set keepRunning to false for each thread, but then how does it get pthread_barrier_wait() to return for each thread? AFAICT the only way to get pthread_barrier_wait() to return is to have all four threads' execution-location inside pthread_barrier_wait() simultaneously, but that's difficult to do when some threads may have exited already.
Calling pthread_barrier_destroy() seems like what I'd want to do, but it's undefined behavior to do that while any threads might be waiting on the barrier.
Is there a good known solution to this problem?
Having two flags and using something like the following should work:
for (;;)
{
pthread_barrier_wait(&g_stage_one_barrier); +
|
UpdateThisThreadsStateVariables(); |
|
pthread_mutex_lock(&shutdownMtx); | Zone 1
pendingShutdown = !keepRunning; |
pthread_mutex_unlock(&shutdownMtx); |
|
pthread_barrier_wait(&g_stage_two_barrier); +
|
if (pendingShutdown) |
break; | Zone 2
|
DoComputationsThatReadFromAllThreadsStateVariables(); |
}
shutdownMtx should protect the setting of keepRunning too, though it's not shown.
The logic is that by the time pendingShutdown gets set to true, all the threads must be within Zone 1. (This is true even if only some of the threads saw keepRunning being false, so races on keepRunning should be okay.) It follows that they will all reach pthread_barrier_wait(&g_stage_two_barrier), and then all break out when they enter Zone 2.
It would also be possible to check for PTHREAD_BARRIER_SERIAL_THREAD -- which is returned by pthread_barrier_wait() for exactly one of the threads -- and only do the locking and updating of pendingShutdown in that thread, which could improve performance.
You could have an additional thread that synchronises on the same barriers, but only exists as a "shutdown master". Your worker threads would use the exact code that you have in your question, and the "shutdown master" thread would do:
while (keepRunning)
{
pthread_barrier_wait(&g_stage_one_barrier);
pthread_mutex_lock(&mkr_lock);
if (!mainKeepRunning)
keepRunning = 0;
pthread_mutex_unlock(&mkr_lock);
pthread_barrier_wait(&g_stage_two_barrier);
}
When the main thread wants the other threads to shut down, it would just do:
pthread_mutex_lock(&mkr_lock);
mainKeepRunning = 0;
pthread_mutex_unlock(&mkr_lock);
(ie. the keepRunning variable becomes part of the shared thread state that is read-only during stage 2, and is owned by the shutdown master thread during stage 1).
Of course, you can also just pick one of your other threads to be the "shutdown master thread" rather than using a dedicated thread for that purpose.
There is a conflict of requirements: barrier semantics require all threads to be in to continue, and shutdown requires termination when threads are shared between execution blocks (could be inside different barriers).
I suggest to replace barrier with a custom implementation that would support extern cancel call.
Example (may not run, but the idea...):
struct _barrier_entry
{
pthread_cond_t cond;
volatile bool released;
volatile struct _barrier_entry *next;
};
typedef struct
{
volatile int capacity;
volatile int count;
volatile struct _barrier_entry *first;
pthread_mutex_t lock;
} custom_barrier_t;
Initialization:
int custom_barrier_init(custom_barrier_t *barrier, int capacity)
{
if (NULL == barrier || capacity <= 0)
{
errno = EINVAL;
return -1;
}
barrier->capacity = capacity;
barrier->count = 0;
barrier->first = NULL;
return pthread_mutex_init(&barrier->lock, NULL);
return -1;
}
Helper:
static void _custom_barrier_flush(custom_barrier_t *barrier)
{
struct _barrier_entry *ptr;
for (ptr = barrier->first; NULL != ptr;)
{
struct _barrier_entry *next = ptr->next;
ptr->released = true;
pthread_cond_signal(&ptr->cond);
ptr = next;
}
barrier->first = NULL;
barrier->count = 0;
}
Blocking wait:
int custom_barrier_wait(custom_barrier_t *barrier)
{
struct _barrier_entry entry;
int result;
pthread_cond_init(&barrier->entry, NULL);
entry->next = NULL;
entry->released = false;
pthread_mutex_lock(&barrier->lock);
barrier->count++;
if (barrier->count == barrier->capacity)
{
_custom_barrier_flush(barrier);
result = 0;
}
else
{
entry->next = barrier->first;
barrier->first = entry;
while (true)
{
pthread_cond_wait(&entry->cond, &barrier->lock);
if (entry->released)
{
result = 0;
break;
}
if (barrier->capacity < 0)
{
errno = ECANCELLED;
result = -1;
break;
}
}
}
pthread_mutex_unlock(&barrier->lock);
pthread_cond_destroy(&entry->cond);
return result;
}
Cancellation:
int custom_barrier_cancel(custom_barrier_t *barrier)
{
pthread_mutex_lock(barrier->lock);
barrier->capacity = -1;
_custom_barrier_flush(barrier);
pthread_mutex_unlock(barrier->lock);
return 0;
}
So the thread code can run in the loop, until it gets ECANCELLED error after custom_barrier_wait call.
The threads that are waiting at the barriers are not the issue, it's the threads that are still running UpdateThis... or DoComputations... that will delay the shutdown. You can reduce the shutdown time by periodically checking for shutdown inside the UpdateThis... and DoComputations... functions.
Here's the outline of one possible solution
main initializes a mutex g_shutdown_mutex
main locks the mutex
main launches the threads
the threads do their thing while periodically trying to lock the
mutex, but since main has the mutex locked, the trylock function
will always fail
when it's time to shutdown, main unlocks the mutex
now the trylock will succeed and the worker functions will return early
before reaching the second barrier, any thread that successfully locks the mutex sets a global variable g_shutdown_requested
after passing the second barrier, all the threads will see the same value in g_shutdown_requested and make the same decision whether to exit or not
So the while loop looks like this
while(1)
{
pthread_barrier_wait(&g_stage_one_barrier);
UpdateThisThreadsStateVariables();
if ( pthread_mutex_trylock( &g_shutdown_mutex ) == 0 )
{
g_shutdown_requested = true;
pthread_mutex_unlock( &g_shutdown_mutex );
break;
}
pthread_barrier_wait(&g_stage_two_barrier);
if ( g_shutdown_requested )
break;
DoComputationsThatReadFromAllThreadsStateVariables();
}
And the worker functions look like this
void UpdateThisThreadsStateVariables( void )
{
for ( i = 0;; i++ )
{
// check the mutex once every 4000 times through the loop
if ( (i & 0xfff) == 0 && pthread_mutex_trylock( &g_shutdown_mutex ) == 0 )
{
pthread_mutex_unlock( &g_shutdown_mutex ); // abnormal termination
return;
}
// do the important stuff here
if ( doneWithTheImportantStuff ) // normal termination
break;
}
}

Wait termination of threads work before close a server after receiving SIGINT or SIGTERM (Posix thread in C)

I'm developing a client / server in C using pthread.
It's a card game where server is the dealer between two clients that play.
I have this situation
server.c:
main:
int main (int argc, char * argv[])
using as a thread dispatcher for listening eventually entering connection on accept
pthread_create(&sig_thread, NULL, &thread_signal_handler, &server_Socket );
thread to catch signals (maskered conveniently) created inside main function
pthread_create(&tid[i++], NULL, &worker, (void*) arr_args)
every client is associated to a thread created (tid is the array of thread ID)
pthread_join(sig_thread, NULL)
to join thread for signals
pthread_join(tid[i], NULL)
to join thread created for every client connected.
worker:
void * worker(void * args)
do his job and terminated
thread signal handler:
void * thread_signal_handler(void * arg)
void * thread_signal_handler(void * arg) {
sigset_t set;
int sig, n;
int * socket = (int*) arg;
sigemptyset(&set);
sigaddset(&set, SIGINT);
sigaddset(&set,SIGTERM);
while (server_UP)
{
/* wait for a signal */
if (sigwait(&set, &sig))
perror("Sigwait");
/* received SIGINT or SIGTERM - closing server */
if ((sig == SIGINT || sig == SIGTERM)) {
printf(TERM_SERVER"\n");
server_UP = 0; /* global variable */
/* close socket blocked on MAIN */
shutdown( *socket, SHUT_RDWR );
}
return (void *) EXIT_SUCCESS;
}
In this way, the signal is catched well and the server is terminated immediately.
I need, if there are matches in progress, to waiting that all the workers finish their jobs properly, so I want to know how to say to thread handler "did you receive a SIGINT (or SIGTERM)? Ok, waiting the termination of all workers".
Any suggests? If you need more code, i'll edit this post.
When you create them you can save all of the threads in an array or a list and at the end call pthread_join() on all of them.
while (pthread_list) {
pthread_join(pthread_list->th_id, NULL);
pthread_list = pthread_list->next;
}
If you don't want to keep track of all threads you can keep a count of running threads, increment it before calling pthread_create, decrement the count as each thread finishes.
Then in the handler you'll sleep for some seconds iteratively until the count becomes 0.

pthread exit from thread in thread pool

I have a rather simple thread pool, and i have a question regarding thread finalizing.
this is my worker snippet :
static void* threadpool_worker(void* pool_instance)
{
int rc;
struct threadpool* pool = (struct threadpool*)pool_instance;
struct threadpool_task *task;
for(;;)
{
pthread_mutex_lock( &(pool->task_queue_mutex) );
while( pool->headp->tqh_first == NULL )
{
rc = pthread_cond_wait( &(pool->task_queue_cond), &(pool->task_queue_mutex) );
}
task = pool->headp->tqh_first;
TAILQ_REMOVE(pool->headp, pool->headp->tqh_first, entries);
pthread_mutex_unlock( &(pool->task_queue_mutex) );
task->routine_cb(task->data);
}
}
so jobs are executed at this line task->routine_cb(task->data);
and in order to finalize workers threads i'm calling threadpool_enqueue_task
in the following way :
for( i=0 ; i < pool->num_of_workers ; ++i)
{
threadpool_enqueue_task(pool, pthread_exit, NULL);
}
expecting that pthread_exit will be called in here task->routine_cb(task->data)
but it does not work this way, i don't see any explicit error, just memory leak in valgrind
but when i change the worker code like that :
if(task->routine_cb == pthread_exit)
{
pthread_exit(0);
}
task->routine_cb(task->data);
everything ends fine.
so my question is is there an option to stop the worker just making it to execute pthread_exit in some way,without changing the worker code.
Edit:
Thread pool task declared as following :
struct threadpool_task
{
void (*routine_cb)(void*);
void *data;
TAILQ_ENTRY(threadpool_task) entries; /* List. */
}
As per my understanig there should be no problem to get address of pthread_exit in routine_cb which declared :
extern void pthread_exit (void *__retval) __attribute__ ((__noreturn__));
I found the cause of the leak. It was my fault of course. I rewrote the job invocation in the following way :
void (*routine)(void*) = task->routine_cb;
void* data = task->data;
free(task);
routine(data);
instead of :
task->routine_cb(task->data);
free(task);
and there were no more leaks, and threads stopped as i expected.
Thanks to everyone who tried to help.

Resources