pthread_create() fails with EAGAIN at 291 cycle - c

I had this code:
int main(int argc, char** argv)
{
pthread_t thread[thr_num];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// just for debugging //
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
printf ("soft = %d \n", rlim.rlim_cur);
printf ("hard = %d \n", rlim.rlim_max);
////
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
return 0;
}
void* loggerThread(void* data)
{
char** sthg = ((char**)data);
pthread_exit(NULL);
}
I don't understand why when I run this code with thr_num=291, I got an error:
pthread_create failure, i = 291, errno = 11 (EAGAIN)
with thr_num=290 worked fine. I run this code on a Linux 2.6.27.54-0.2-default (SLES 11)
The rlim.rlim_cur has value 6906 the rlim.rlim_max also. The same I saw with 'ulimit -a' for 'max user processes'.
I checked also /proc/sys/kernel/threads-max (it was 13813) guided by pthread_create man page.
Did not find any parameters with value 290 for 'sysctl -a' output either.
Ocassionally I found out from this link:
pthread_create and EAGAIN
that: "Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable"
so just as a try I modified my code to this:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
if( pthread_join(thread[i], (void**)&status ) ) {
printf("pthread_join failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
and then everything worked: I didn't get the error at 291 cycle.
I would like to understand why with my original code I got the error:
1. because of a wrong programing with threads
2. or I hit some system limit what I couldn't identify
Also would like to know if my correction is good for this problem or what hidden things, pitfalls I eventually introduced with this solution ?
Thanks !

I would like to understand why with my original code I got the error: 1. because of a wrong
programing with threads 2. or I hit some system limit what I couldn't identify
You likely hit a system limit. Likely you ran out of address space. Default, each thread gets 8-10Mb of stack space on linux. If you create 290 threads, that's using nearly 3Gb of address space - the max for a 32 bit process.
You get EAGAIN in such a case, since there arn't enough resources to create the thread just now (since there isn't enough address space available at the time).
When a thread exits, not all resources of the thread is released (on linux, the entire stack of the thread is kept around).
If the thread is in a detached state, e.g. you called pthread_detach() or specified a detached state when it was created as an attribute to pthread_create(), all resources are release when the thread exits - but you can't pthread_join() a detached thread.
If the thread is not detached, you need to call pthread_join() on it to release the resources.
Note that the modified code of yours where you call pthread_join() inside the loop will:
spawn a thread
Wait for that thread to finish
go to 1
i.e. only one other thread is running at a time - which seems a bit pointless.
You can certainly spawn more than one thread that run concurrently - but there's a limit. On your machine, you seem to have found the limit to be around 290.

I initially wrote this as a comment, but just in case...
Your code:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
...
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
In both the for() loops you check from 1 - thr_num. This means you are out of bounds in your array thread[thr_num] since arrays start at index 0. You should thus iterate from 0 to one less than thr_num:
for ( i = 0 ; i < thr_num ; i++)
I'm actually surprised you didn't get a segmentation fault before hitting 291 as thr_num.

Related

Random Numbers when Passing Address of Integers into pthread_create

I found a similar post pthread_create and passing an integer as the last argument however when implementing it I still receive the wrong ID values (as shown in the output log at the bottom of the post).
I have this portion of code which creates the threads. Its pretty standard. I have an array of threads which allocate a thread id which is passed into the function.
int nthreads = 7;
pthread_t tid[nthreads];
fprintf(stdout, "Creating threads.\n");
for (int i =0; i < nthreads; i++){
tid[i] = i;
if(pthread_create(&tid[i], NULL, threadRequestFile, &tid[i]) != 0){
fprintf(stdout, "Error pthread_create().\n");
steque_destroy(workerQueue);
free(workerQueue);
workerQueue = NULL;
exit(EXIT_FAILURE);
}
}
fprintf(stdout, "Created %d threads.\n", nthreads);
for(int i = 0; i < nthreads; i++){
fprintf(stdout, "Awaiting Thread %d.\n", i);
pthread_join(tid[i], NULL);
}
Finally my pointer function with has the ids stored in the address of tid[i] are passed into to print to the console:
void *threadRequestFile(void *nRequests){
int totalRequests = * ((int *)nRequests);
fprintf(stdout, "Total Request: %d\n", totalRequests);
}
Unfortunately closely following the code in the post above and from other sources I found online, my console is still printing weird numbers rather than 0-6. Can anyone help me out as to why this is occurring?
Creating threads.
Total Request: -1210059008 // Should be 0
Total Request: -1218451712 // ..
Total Request: -1226844416 // ..
Total Request: -1235237120 // ..
Total Request: -1243629824 // ..
Total Request: -1252022528 // ..
Total Request: -1260415232 // 6
Created 7 threads.
Awaiting Thread 0.
Awaiting Thread 1.
Awaiting Thread 2.
Awaiting Thread 3.
Awaiting Thread 4.
Awaiting Thread 5.
Awaiting Thread 6.

program intermittently stuck with main reporting a different thread id as opposed to the thread itself

I am trying to figure out how multi-threading works, this is my code :
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <pthread.h>
static pthread_cond_t threadDied = PTHREAD_COND_INITIALIZER ; // cond var initialization
static pthread_mutex_t threadMutex = PTHREAD_MUTEX_INITIALIZER ; // mutex initialization
// this mutex will protect all of the below global vars
static int totThreads = 0 ; // total number of threads created
static int numLive = 0 ; // Total no. of threads still alive .. or terminated but not joined
static int numUnjoined = 0 ; // no. of threads that have not yet been joined
enum tstate { // enumeration of thread states
TS_ALIVE, // thread is alive
TS_TERMINATED, // thread terminated, not yet joined
TS_JOINED // thread terminated and joined
};
static struct { // info about each thread
pthread_t tid ; // thread ID
enum tstate state; // Thread state as per the above enum
int sleepTime ; // no. of seconds to live before terminating
} *thread ; // name of the struct .. well a pointer
static void *threadFunc (void *arg) { // default start function for each thread
int idx = *(int *)arg ; // since arg is of type void , we typecast it to * of type int and deref it
int s ; // for ret val
sleep(thread[idx].sleepTime) ; // pretending as though thread is doing some work :/
s = pthread_mutex_lock(&threadMutex);
if (s!=0) {
printf("whoops, couldn't acquire mutex\n") ;
fflush(stdout);
exit (-1) ;
}
numUnjoined ++ ;
thread[idx].state = TS_TERMINATED ;
s = pthread_mutex_unlock(&threadMutex) ;
if ( s!=0 ) {
printf("whoops, couldn't release mutex\n") ;
fflush(stdout);
exit (-2) ;
}
s = pthread_cond_signal(&threadDied) ; // signalling any listening thread to wake up !!
if (s != 0) {
printf("whoops, couldn't signal the main thread to reap\n");
fflush(stdout);
exit (-3) ;
}
printf("Thread %d has worked hard and is now terminating\n", idx);
fflush(stdout);
return NULL ;
}
int main(int argc, char *argv[]) {
int s, idx ;
if (argc < 2 || strcmp(argv[1], "--help") == 0) {
printf("Usage : %s nsecs...\n", argv[0]);
fflush(stdout);
exit(-4) ;
}
thread = calloc(argc -1, sizeof(*thread) );
if (thread == NULL) {
printf("whoops, couldn't allocate memory of size %lu\n", (argc -1) * sizeof(*thread) );
fflush(stdout);
exit(-5);
}
// Let's create all the threads now !!
for (idx =0 ; idx < argc -1 ; idx++ ) {
thread[idx].sleepTime = atoi(argv[idx + 1 ]) ; // thread sleeps for the duration entered in the cmd line
thread[idx].state = TS_ALIVE ;
s = pthread_create(&thread[idx].tid, NULL, threadFunc, &idx);
printf("Main created thread %d with tid : %lu \n", ( * (int *)&idx ), (unsigned long)thread[idx].tid);
fflush(stdout);
if (s != 0 ){
printf("whoops couldn't create thread %lu\n",(unsigned long) (&thread[idx].tid) );
fflush(stdout);
exit(-6) ;
}
//sleep(1); // << -- if I don't add this sleep, then it just deadlocks
}
totThreads = argc -1 ;
numLive = totThreads ;
// Join terminated threads
while (numLive > 0 ) {
s = pthread_mutex_lock(&threadMutex) ;
if (s!=0){
printf("whoops, couldn't lock mutex for joining\n") ;
fflush(stdout);
exit(-7) ;
}
while (numUnjoined == 0) {
s = pthread_cond_wait(&threadDied, &threadMutex) ;
if (s!=0) {
printf("whoops, couldn't wait for thread join\n") ;
fflush(stdout);
exit(-8) ;
}
}
for (idx = 0 ; idx < totThreads ; idx++ ) {
if (thread[idx].state == TS_TERMINATED) {
s = pthread_join(thread[idx].tid, NULL) ;
if (s!=0) {
printf("Failed thread join\n");
fflush(stdout);
exit(-9) ;
}
thread[idx].state = TS_JOINED ;
numLive-- ;
numUnjoined-- ;
printf("Reaped thread %d (numLive=%d)\n", idx, numLive);
fflush(stdout);
}
}
s = pthread_mutex_unlock(&threadMutex) ;
if (s!=0){
printf("whopps, couldn't unlock mutex after joining\n");
fflush(stdout);
exit(-10) ;
}
}
exit(EXIT_SUCCESS);
}
For a thread count of 1, this code works sometimes, at other times it just hangs :(
WORKING :
#./thread_multijoin 1
Main created thread 0 with tid : 139835063281408
Thread 0 has worked hard and is now terminating
Reaped thread 0 (numLive=0)
HANG :
#./thread_multijoin 1
Main created thread 0 with tid : 140301613573888
Thread 1 has worked hard and is now terminating
^C
NOTICE here that Main says "Thread 0 was created" ; whereas the thread itself says "Thread 1" ... why is there a mismatch ??
It definitely gets stuck when I have multiple threads :
#./thread_multijoin 1 2 2 1
Main created thread 0 with tid : 140259455936256
Main created thread 1 with tid : 140259447543552
Main created thread 2 with tid : 140259439150848
Main created thread 3 with tid : 140259430758144
Thread 4 has worked hard and is now terminating
Thread 0 has worked hard and is now terminating
Reaped thread 0 (numLive=3)
Reaped thread 3 (numLive=2)
Thread 3 has worked hard and is now terminating
Reaped thread 2 (numLive=1)
Thread 2 has worked hard and is now terminating
^C
the only thing I am understanding from this is that the thread ID's reported by main and the thread itself are different, so I am guessing due to parallel scheduling there is something going on with the thread counter ... can you guys help me narrow this down please?
Thanks in advance.
========================================
Thanks #mevets and #user3386109 for the answer :)
I tried doing what #mevets suggested : i,e
pthread_create(&thread[idx].tid, NULL, threadFunc, (void *)idx);
and
int idx = (int)arg ;
but got this error when compiling :
thread_multijoin.c: In function ‘threadFunc’:
thread_multijoin.c:32:15: error: cast from pointer to integer of different
size [-Werror=pointer-to-int-cast]
int idx = (int)arg ; // since arg is of type void , we typecast it to * of type int and deref it
thread_multijoin.c: In function ‘main’:
thread_multijoin.c:90:64: error: cast to pointer from integer of different
size [-Werror=int-to-pointer-cast]
s = pthread_create(&thread[idx].tid, NULL, threadFunc, (void *)idx );
Upon researching further, found this thread :
cast to pointer from integer of different size, pthread code
which suggested the use of intptr_t :
s = pthread_create(&thread[idx].tid, NULL, threadFunc, (void *)(intptr_t)idx );
and
int idx = (intptr_t)arg
That worked perfectly fine without errors . Thanks once again for your time, really appreciate it :)
PS : to use intptr_t , you need to use _GNU_SOURCE :
#define _GNU_SOURCE
[ the thread id ]:
You pass the address of idx into each thread, then dereference it to index the table. So each thread gets the same pointer argument.
You probably wanted to:
s = pthread_create(&thread[idx].tid, NULL, threadFunc, (void *)idx);
and
int idx = (int)arg ; // since arg is of type void , we typecast it to * of type int and deref it
ie; not deref it, just pass it in a “void *” container.

Thread doesn't recognize change in a flag

I Work with couple of threads. all running as long as an exit_flag is set to false.
I Have specific thread that doesn't recognize the change in the flag, and therefor not ending and freeing up its resources, and i'm trying to understand why.
UPDATE: After debugging a bit with gdb, i can see that given 'enough time' the problematic thread does detects the flag change.
My conclusion from this is that not enough time passes for the thread to detect the change in normal run.
How can i 'delay' my main thread, long enough for all threads to detect the flag change, without having to JOIN them? (the use of exit_flag was in an intention NOT to join the threads, as i don't want to manage all threads id's for that - i'm just detaching each one of them, except the thread that handles input).
I've tried using sleep(5) in close_server() method, after the flag changing, with no luck
Notes:
Other threads that loop on the same flag does terminate succesfully
exit_flag declaration is: static volatile bool exit_flag
All threads are reading the flag, flag value is changed only in close_server() method i have (which does only that)
Data race that may occur when a thread reads the flag just before its changed, doesn't matter to me, as long as in the next iteration of the while loop it will read the correct value.
No error occurs in the thread itself (according to strerr & stdout which are 'clean' from error messages (for the errors i handle in the thread)
Ths situation also occurs even when commenting out the entire while((!exit_flag) && (remain_data > 0)) code block - so this is not a sendfile hanging issure
station_info_t struct:
typedef struct station_info {
int socket_fd;
int station_num;
} station_info_t;
Problematic thread code:
void * station_handler(void * arg_p)
{
status_type_t rs = SUCCESS;
station_info_t * info = (station_info_t *)arg_p;
int remain_data = 0;
int sent_bytes = 0;
int song_fd = 0;
off_t offset = 0;
FILE * fp = NULL;
struct stat file_stat;
/* validate station number for this handler */
if(info->station_num < 0) {
fprintf(stderr, "station_handler() station_num = %d, something's very wrong! exiting\n", info->station_num);
exit(EXIT_FAILURE);
}
/* Open the file to send, and get his stats */
fp = fopen(srv_params.songs_names[info->station_num], "r");
if(NULL == fp) {
close(info->socket_fd);
free(info);
error_and_exit("fopen() failed! errno = ", errno);
}
song_fd = fileno(fp);
if( fstat(song_fd, &file_stat) ) {
close(info->socket_fd);
fclose(fp);
free(info);
error_and_exit("fstat() failed! errno = ", errno);
}
/** Run as long as no exit procedure was initiated */
while( !exit_flag ) {
offset = 0;
remain_data = file_stat.st_size;
while( (!exit_flag) && (remain_data > 0) ) {
sent_bytes = sendfile(info->socket_fd, song_fd, &offset, SEND_BUF);
if(sent_bytes < 0 ) {
error_and_exit("sendfile() failed! errno = ", errno);
}
remain_data = remain_data - sent_bytes;
usleep(USLEEP_TIME);
}
}
printf("Station %d handle exited\n", info->station_num);
/* Free \ close all resources */
close(info->socket_fd);
fclose(fp);
free(info);
return NULL;
}
I'll be glad to get some help.
Thanks guys
Well, as stated by user362924 the main issue is that i don't join the threads in my main thread, therefore not allowing them enough time to exit.
A workaround to the matter, if for some reason one wouldn't want to join all threads and dynamically manage thread id's, is to use sleep command in the end of the main thread, for a couple of seconds.
of course this workaround is not good practice and not recommended (to anyone who gets here by google)

pthread_join seems to modify my loop index

My code (see below) produces an odd behaviour. The output is:
Testing whether there are problems with concurrency ...rc is 0. i is 0
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 3. i is 2
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 3. i is 2
.rc is 3. i is 3
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 3. i is 2
.rc is 3. i is 3
.rc is 3. i is 4
.rc is 0. i is 0
Segmentation fault (core dumped)
I tried to debug it, but only found out that i is reset to 0 right after pthread_join. This leads me to the conclusion that the modification must happen somewhere there. But i can't find a thing. I feel kind of stupid, since this isn't really a hard piece of code. What did i not notice?
Operating system is Ubuntu 14.04. N_THREADS is currently set to 10, N_RUNS is 10000.
Main thread:
pthread_t threads[N_THREADS];
pthread_attr_t attr;
int i;
int rc;
int status;
printf("Testing whether there are problems with concurrency ...");
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for (i = 0; i < N_THREADS; i++){
if (i) {
rc = pthread_create(&(threads[i]), &attr, addRemove, 0);
} else {
rc = pthread_create(&(threads[i]), &attr, readStuff, 0);
}
if (rc) return rc;
}
for(i = 0; i < N_THREADS; i++) {
rc = pthread_join(threads[i], (void*) &status);
// if(rc == 3)
printf("rc is %d. i is %d\n", rc, i);
// if (rc) return rc;
if (status) return status;
printf(".");
}
pthread_attr_destroy(&attr);
return 0;
Worker threads:
void* readStuff(void* a)
{
int i;
for (i = 0; i< N_RUNS; i++){
;
}
pthread_exit((void*)0);
}
void* addRemove(void* a)
{
int i;
for (i = 0; i< N_RUNS; i++){
;
}
pthread_exit((void*)0);
}
There are no other threads except the main thread and the ones created in the code above.
Compileable example
I think your problem is with the pthread_join. From the man page:
int pthread_join(pthread_t thread, void **retval);
...
If retval is not NULL, then pthread_join() copies the exit status of the tar‐
get thread (i.e., the value that the target thread supplied to
pthread_exit(3)) into the location pointed to by *retval. If the target
thread was canceled, then PTHREAD_CANCELED is placed in *retval.
Note that it takes a void **, which means it overwrites the thing pointed to by retval with a void * (size 8 on 64 bit). You are passing an int * (i.e. &status), which is a pointer to an object of size 4 on most platforms.
So, pthread_join will be overwriting memory. Instead, declare status as a void * as per the function prototype.
You are also testing status; I don't know what you are trying to achieve here.
In general, compiling with -Wall will show you these errors.

pthread_cancel always crashes

I have a program that is trying to use create and cancel through an implemented pool.
The creation is as follows:
while (created<threadsNum){
pthread_t newThread;
pthread_struct *st; //Open the thread that handle the deleting of the sessions timeout.
st = (pthread_struct*)malloc(sizeof(pthread_struct));
st->id = created;
st->t = &newThread;
pthread_mutex_lock( &mutex_threadsPool );
readingThreadsPool[created] = st;
pthread_mutex_unlock( &mutex_threadsPool );
if((threadRes1 = pthread_create( &newThread, NULL, pcapReadingThread, (void*)created)))
{
syslog(LOG_CRIT, "Creating Pcap-Reading Thread %d failed.",created);
printf( "Creating Pcap-Reading Thread %d failed.\n",created);
exit(1);
}
syslog(LOG_INFO, "Created Pcap-Reading Thread %d Successfully.",created);
created++;
}
Later I try to cancel them and restart them :
pthread_t* t;
pthread_struct* tstr;
int i;
pthread_mutex_unlock( &mutex_threadsPool );
//first go on array and kill all threads
for(i = 0; i<threadsNum ; i++ ){
tstr = readingThreadsPool[i];
if (tstr!=NULL){
t = tstr->t;
//Reaches here :-)
if (pthread_cancel(*t)!=0){
perror("ERROR : Could not kill thread");
}
else{
printf("Killed Thread %d \n",i);
}
//doesnt reach here
}
}
I checked the addresses in the memory of the created thread in part one and the address of the about to be cancelled thread in the second part..they match..
I read about the thread manager that can't work if one calls killall().
But I don't..
Anyone have any idea?
Thanks
while (created<threadsNum){
pthread_t newThread;
pthread_struct *st;
/* ... */
st->t = &newThread;
/* ... */
}
You've got st->t pointing to a local variable newThread. newThread is only in scope during the current loop iteration. After this iteration st->t will contain an invalid address.
newThread is on the stack, so after it goes out of scope that stack space will be used for other variables. That could be different pthread_ts on successive iterations, or once the loop is over then that stack space will be used for completely different types of values.
To fix this I'd probably change pthread_struct.t to be a pthread_t instead of a pthread_t *, and then change the pthread_create call to:
pthread_create(&st->t, /*...*/)
Also, you should be careful about adding st to the thread pool before you've called pthread_create. It should probably be added after. As it stands, there's a small window where st->t is on the thread pool but has not been initialized.

Resources