I have a program that is trying to use create and cancel through an implemented pool.
The creation is as follows:
while (created<threadsNum){
pthread_t newThread;
pthread_struct *st; //Open the thread that handle the deleting of the sessions timeout.
st = (pthread_struct*)malloc(sizeof(pthread_struct));
st->id = created;
st->t = &newThread;
pthread_mutex_lock( &mutex_threadsPool );
readingThreadsPool[created] = st;
pthread_mutex_unlock( &mutex_threadsPool );
if((threadRes1 = pthread_create( &newThread, NULL, pcapReadingThread, (void*)created)))
{
syslog(LOG_CRIT, "Creating Pcap-Reading Thread %d failed.",created);
printf( "Creating Pcap-Reading Thread %d failed.\n",created);
exit(1);
}
syslog(LOG_INFO, "Created Pcap-Reading Thread %d Successfully.",created);
created++;
}
Later I try to cancel them and restart them :
pthread_t* t;
pthread_struct* tstr;
int i;
pthread_mutex_unlock( &mutex_threadsPool );
//first go on array and kill all threads
for(i = 0; i<threadsNum ; i++ ){
tstr = readingThreadsPool[i];
if (tstr!=NULL){
t = tstr->t;
//Reaches here :-)
if (pthread_cancel(*t)!=0){
perror("ERROR : Could not kill thread");
}
else{
printf("Killed Thread %d \n",i);
}
//doesnt reach here
}
}
I checked the addresses in the memory of the created thread in part one and the address of the about to be cancelled thread in the second part..they match..
I read about the thread manager that can't work if one calls killall().
But I don't..
Anyone have any idea?
Thanks
while (created<threadsNum){
pthread_t newThread;
pthread_struct *st;
/* ... */
st->t = &newThread;
/* ... */
}
You've got st->t pointing to a local variable newThread. newThread is only in scope during the current loop iteration. After this iteration st->t will contain an invalid address.
newThread is on the stack, so after it goes out of scope that stack space will be used for other variables. That could be different pthread_ts on successive iterations, or once the loop is over then that stack space will be used for completely different types of values.
To fix this I'd probably change pthread_struct.t to be a pthread_t instead of a pthread_t *, and then change the pthread_create call to:
pthread_create(&st->t, /*...*/)
Also, you should be careful about adding st to the thread pool before you've called pthread_create. It should probably be added after. As it stands, there's a small window where st->t is on the thread pool but has not been initialized.
Related
I have some random issues sometimes to join pthread. I can just say that the thread is not stuck in a deadlock with a mutex when the join is failing. Most of the time the thread is idle (sleep syscall) when the timeout occurred on join.
My need is basic. A way to start/stop a thread from the main thread. So I don't need to put mutex in start/stop manager on pthread state variable. The thread is working as an infinite loop most of the time. All my thread are designed with the same skeleton. A start and stop function. The thread function definition. I have a global variable g_event_ctx to store the current status of the thread. running to know I need to cancel it. is_joinable to know if I need to join the thread. Moreover I have sleep/read/write syscall on all my thread function (cancel point !)
typedef struct pthread_context
{
pthread_t id; /*!< pthread_t to be able to stop the thread later */
int running; /*!< allow to know if the thread is currently running */
int is_joinable; /*!< allow to know if the thread is joinable */
} str_pthread_context;
The code of the skeleton :
int start_x_manager (void)
{
pthread_t t_x;
if (g_event_ctx.x_thread.is_joinable) return 0;
PRINT_INFO ("Start x manager");
// start push x thread
if (pthread_create (&t_x, NULL, x_loop_thread, NULL))
PRINT_ERR_GOTO ("error on pthread_create for x thread");
pthread_setname_np(t_x, "x");
g_event_ctx.x_thread.id = t_x;
g_event_ctx.x_thread.is_joinable = 1;
g_event_ctx.x_thread.running = 1;
return 0;
error:
g_event_ctx.x_thread.running = 0;
g_event_ctx.x_thread.is_joinable = 0;
return 1;
}
int stop_x_manager (void)
{
struct timespec ts;
if (!g_event_ctx.x_thread.is_joinable) return 0;
PRINT_INFO ("Stop x manager");
if (g_event_ctx.x_thread.running)
{
CHECK_ERR_GOTO (pthread_cancel(g_event_ctx.x_thread.id) != 0, "Cannot cancel x thread");
g_event_ctx.x_thread.running = 0;
}
CHECK_ERR_GOTO (clock_gettime(CLOCK_REALTIME, &ts) == -1, "Cannot get clock time");
ts.tv_sec += 5;
CHECK_ERR_GOTO (pthread_timedjoin_np (g_event_ctx.x_thread.id, NULL, &ts) != 0, "Cannot join x_thread");
g_event_ctx.x_thread.is_joinable = 0;
return 0;
error:
g_event_ctx.x_thread.running = 0;
g_event_ctx.x_thread.is_joinable = 0;
return 1;
}
The skeleton of the thread function :
void *x_loop_thread (void *arg __attribute__((__unused__)))
{
CHECK_ERR_GOTO (pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) != 0, "Cannot set cancel state");
CHECK_ERR_GOTO (pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL) != 0, "Cannot set cancel state");
PRINT_INFO ("Start x manager loop thread ...");
pthread_cleanup_push(x_manager_cleanup, some_stuff);
while (1)
{
// Do some stuff here
}
g_event_ctx.x_thread.running = 0;
pthread_exit (NULL);
error:
g_event_ctx.x_thread.running = 0;
pthread_cleanup_pop(1);
pthread_exit (NULL);
}
CHECK_ERR_GOTO is a macro which check a condition to know if I need to jump to label error.
What is the reason which can explain a timeout on the pthread_timedjoin_np ? Another piece of code which corrupted my thread id ? Is there a problem of design in my skeleton ?
You can sidestep the problem by putting a variable in the context structure indicating you want the background thread to stop, setting that variable in your main thread before calling join, and checking that variable periodically in the background thread, exiting the while(1) loop if it's true.
If you have any blocking calls that sleep forever, you can either have them time out and loop them with while(!want_to_stop) or, for select loops, add a file descriptor you can activate from the main thread when you want to stop (an eventfd or pipe).
I am writing a concurrent C program where I want to wait for all threads to finish in the main().
Based on this solution, I wrote the following code in main():
// Create threads
pthread_t cid[num_mappers];
int t_iter;
for (t_iter = 0; t_iter < num_mappers; t_iter++){
pthread_create(&(cid[t_iter]), NULL, &map_consumer, NULL);
}
// Wait for all threads to finish
for (t_iter = 0; t_iter < num_mappers; t_iter++){
printf("Joining %d\n", t_iter);
int result = pthread_join(cid[t_iter], NULL);
}
printf("Done mapping.\n");
The function passed into threads is defined as:
// Consumer function for mapping phase
void *map_consumer(void *arg){
while (1){
pthread_mutex_lock(&g_lock);
if (g_cur >= g_numfull){
// No works to do, just quit
return NULL;
}
// Get the file name
char *filename = g_job_queue[g_cur];
g_cur++;
pthread_mutex_unlock(&g_lock);
// Do the mapping
printf("%s\n", filename);
g_map(filename);
}
}
The threads are all successfully created and executed, but the join loop will never finish if num_mappers >= 2.
You return without unlocking the mutex:
pthread_mutex_lock(&g_lock);
if (g_cur >= g_numfull){
// No works to do, just quit
return NULL; <-- mutex is still locked here
}
// Get the file name
char *filename = g_job_queue[g_cur];
g_cur++;
pthread_mutex_unlock(&g_lock);
So only one thread ever returns and ends - the first one, but since it never unlocks the mutex, the other threads remain blocked.
You need something more like
pthread_mutex_lock(&g_lock);
if (g_cur >= g_numfull){
// No works to do, just quit
pthread_mutex_unlock(&g_lock);
return NULL;
}
// Get the file name
char *filename = g_job_queue[g_cur];
g_cur++;
pthread_mutex_unlock(&g_lock);
I have a server application that creates new thread for every incoming request.
However, every once in a while, it will create a thread with thread ID = 0 (used pthread_equal to check this). I have a structure that contains the thread ID that I pass to the function specified in pthread_create, and am checking this there.
Why would a thread get created with ID = 0?
Is there anything I can do if this happens? I cannot use this thread and want to exit it immediately.
=====================================================================
typedef struct
{
pthread_t tid;
other_struct_t Other;
} data_ptr_t;
void * worker(void * arg)
{
data_ptr_t local_data;
data_ptr_t * incoming_data = (data_ptr_t *) arg;
if (NULL == incoming_data || NULL == incoming_data->Other)
{
printf("invalid input\n");
}
else if (pthread_equal(incoming_data->tid, 0))
{
printf("invalid thread id\n");
}
else
{
// add to global thread pool
// do other stuff here
// remove from global thread pool
}
}
int main()
{
// server socket stuff
while (1)
{
// if incoming connection is valid
data_ptr_t data;
int error = pthread_create(&(data.tid), NULL, (void * (*) (void *)) worker, (void *) &data);
if (0 != errror)
{
printf("could not create thread (%d)\n", error);
}
else
{
pthread_detach(data.tid);
printf("thread dispatched\n");
}
}
}
Note: If the number of threads I'm creating is under 50 or so, it works fine. Upwards of 70, most threads go through just fine, the rest end up printing the "invalid thread id".
Note: This is on Linux.
You can't do this:
while (1)
{
// if incoming connection is valid
data_ptr_t data;
int error = pthread_create(&(data.tid),
NULL, (void * (*) (void *)) worker, (void *) &data);
your data_ptr_t is a local variable on the stack. On the next iteration of the while loop, that variable is destroyed/gone/not-valid.
The while loop might start another iteration long before the new worker thread starts running and makes use of the data you pass to it. Instead, dynamically allocate the data you pass to the worker thread so you can be sure it's still valid.
I had this code:
int main(int argc, char** argv)
{
pthread_t thread[thr_num];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// just for debugging //
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
printf ("soft = %d \n", rlim.rlim_cur);
printf ("hard = %d \n", rlim.rlim_max);
////
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
return 0;
}
void* loggerThread(void* data)
{
char** sthg = ((char**)data);
pthread_exit(NULL);
}
I don't understand why when I run this code with thr_num=291, I got an error:
pthread_create failure, i = 291, errno = 11 (EAGAIN)
with thr_num=290 worked fine. I run this code on a Linux 2.6.27.54-0.2-default (SLES 11)
The rlim.rlim_cur has value 6906 the rlim.rlim_max also. The same I saw with 'ulimit -a' for 'max user processes'.
I checked also /proc/sys/kernel/threads-max (it was 13813) guided by pthread_create man page.
Did not find any parameters with value 290 for 'sysctl -a' output either.
Ocassionally I found out from this link:
pthread_create and EAGAIN
that: "Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable"
so just as a try I modified my code to this:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
if( pthread_join(thread[i], (void**)&status ) ) {
printf("pthread_join failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
and then everything worked: I didn't get the error at 291 cycle.
I would like to understand why with my original code I got the error:
1. because of a wrong programing with threads
2. or I hit some system limit what I couldn't identify
Also would like to know if my correction is good for this problem or what hidden things, pitfalls I eventually introduced with this solution ?
Thanks !
I would like to understand why with my original code I got the error: 1. because of a wrong
programing with threads 2. or I hit some system limit what I couldn't identify
You likely hit a system limit. Likely you ran out of address space. Default, each thread gets 8-10Mb of stack space on linux. If you create 290 threads, that's using nearly 3Gb of address space - the max for a 32 bit process.
You get EAGAIN in such a case, since there arn't enough resources to create the thread just now (since there isn't enough address space available at the time).
When a thread exits, not all resources of the thread is released (on linux, the entire stack of the thread is kept around).
If the thread is in a detached state, e.g. you called pthread_detach() or specified a detached state when it was created as an attribute to pthread_create(), all resources are release when the thread exits - but you can't pthread_join() a detached thread.
If the thread is not detached, you need to call pthread_join() on it to release the resources.
Note that the modified code of yours where you call pthread_join() inside the loop will:
spawn a thread
Wait for that thread to finish
go to 1
i.e. only one other thread is running at a time - which seems a bit pointless.
You can certainly spawn more than one thread that run concurrently - but there's a limit. On your machine, you seem to have found the limit to be around 290.
I initially wrote this as a comment, but just in case...
Your code:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
...
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
In both the for() loops you check from 1 - thr_num. This means you are out of bounds in your array thread[thr_num] since arrays start at index 0. You should thus iterate from 0 to one less than thr_num:
for ( i = 0 ; i < thr_num ; i++)
I'm actually surprised you didn't get a segmentation fault before hitting 291 as thr_num.
I want to monitor threads. I used condition variables for send & receive HeartBeat & Acknowlagement signals for that.
scnMonitor_t is a monitor structure. As new threads are added it register with monitor & added to scnThreadlist_t.
monitorHeartbeatCheck is the thread that starts with program,
monitorHeartbeatProcess is API which are added to all thread functions.
Actually my problem is that the index of process is not properly followed
It ends with a wait HB condition for 3rd Thread & dead-lock is created.
what should be the problem?
thanks in advance.
typedef struct scnThreadList_{
osiThread_t thread;
struct scnThreadList_ *next;
} scnThreadList_t;
typedef struct scnMonitor_{
bool started;
osiThread_t heartbeatThread;
osiMutex_t heartbeatMutex;
osiMutex_t ackMutex;
osiCond_t heartbeatCond;
scnThreadList_t *threads;
} scnMonitor_t;
static scnMonitor_t *s_monitor = NULL;
// Main heartbeat check thread
void* monitorHeartbeatCheck( void *handle )
{
scnThreadList_t *pObj = NULL;
static int idx = 0;
static bool waitAck = false;
while ( 1 ) {
pObj = s_monitor->threads;
while ( pObj && ( pObj != s_monitor->heartbeatThread ) ) { //skip it-self from monitoring.
++idx;
printf("\"HB Check No.%d\"\n",idx);
// send heartbeat
usleep( 250 * 1000 );
pthread_mutex_lock( s_monitor->heartbeatMutex, 1 );
pthread_cond_signal( s_monitor->heartbeatCond );
printf("-->C %d HB sent\n",idx);
pthread_mutex_unlock( s_monitor->heartbeatMutex );
// wait for ACK
while( !waitAck ){
pthread_mutex_lock( s_monitor->ackMutex, 1 );
printf("|| C %d wait Ack\n",idx);
waitAck = true;
pthread_cond_wait( s_monitor->heartbeatCond, s_monitor->ackMutex );
waitAck = false;
printf("<--C %d received Ack\n",idx);
pthread_mutex_unlock( s_monitor->ackMutex );
LOG_INFO( SCN_MONITOR, "ACK from thread %p \n", pObj->thread );
}
pObj = pObj->next;
}
} // while, infinite
return NULL;
}
// Waits for hearbeat and acknowledges
// Call this API from every thread function that are registered
int monitorHeartbeatProcess( void )
{
static int id = 0;
static bool waitHb = false;
++ id;
printf("\"HB Process No.%d\"\n",id);
// wait for HB
while(!waitHb){
pthread_mutex_lock( s_monitor->heartbeatMutex, 1 );
printf("|| P %d wait for HB\n",id);
waitHb = true;
pthread_cond_wait( s_monitor->heartbeatCond, s_monitor->heartbeatMutex );
waitHb = false;
printf("<--P %d HB received \n",id);
pthread_mutex_unlock( s_monitor->heartbeatMutex );
}
// send ACK
uleep( 250 * 1000 );
pthread_mutex_lock( s_monitor->ackMutex, 1 );
pthread_cond_signal( s_monitor->heartbeatCond );
printf("-->P %d ACK sent\n",id);
pthread_mutex_unlock( s_monitor->ackMutex );
return 1;
}
You should always associate only one mutex with a condition at a time. Using two different mutexes with the same condition at the same time could lead to unpredictable serialization issues in your application.
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=%2Fapis%2Fusers_78.htm
You have 2 different mutexes with your condition heartbeatCond.
I think you are experiencing a deadlock here. The thread calling monitorHeartbeatProcess() takes mutex on heartbeatMutex and waits for signal on the condition variable, heartbeatCond. While thread calling monitorHeartbeatCheck() takes mutex on ackMutex and waits for sognal on condition variable, heartbeatCond. Thus both threads waits on the condition variable heartbeatCond causing deadlock. If you are so particular in using two mutexes, why not two condition variables?