I am writing a C library which needs to fork() during initialization. Therefore, I want to assert() that the application code (which is outside of my control) calls my library initialization code from a single threaded context (to avoid the well known "threads and fork don't mix" problem). Once my library has been initialized, it is thread safe (and expected that the application level code may create threads). I am only concerned with supporting pthreads.
It seems impossible to count the number of threads in the current process space using pthreads. Indeed, even googletest only implements GetThreadCount() on Mac OS and QNX.
Given that I can't count the threads, is it somehow possible that I can instead assert a single threaded context?
Clarification: If possible, I would like to avoid using "/proc" (non-portable), an additional library dependency (like libproc) and LD_PRELOAD-style pthread_create wrappers.
Clarification #2: In my case using multiple processes is necessary as the workers in my library are relatively heavy weight (using webkit) and might crash. However, I want the original process to survive worker crashes.
You could mark your library initialization function to be run prior to the application main(). For example, using GCC,
static void my_lib_init(void) __attribute__((constructor));
static void my_lib_init(void)
{
/* ... */
}
Another option is to use posix_spawn() to fork and execute the worker processes as separate, slave binaries.
EDITED TO ADD:
It seems to me that if you wish to determine if the process has already created (actual, kernel-based) threads, you will have to rely on OS-specific code.
In the Linux case, the determination is simple, and safe to run on other OSes too. If it cannot determine the number of threads used by the current process, the function will return -1:
#include <unistd.h>
#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
int count_threads_linux(void)
{
DIR *dir;
struct dirent *ent;
int count = 0;
dir = opendir("/proc/self/task/");
if (!dir)
return -1;
while (1) {
errno = 0;
ent = readdir(dir);
if (!ent)
break;
if (ent->d_name[0] != '.')
count++;
}
if (errno) {
const int saved_errno = errno;
closedir(dir);
errno = saved_errno;
return -1;
}
if (closedir(dir))
return -1;
return count;
}
There are certain cases (like chroot without /proc/) when that check will fail even in Linux, so the -1 return value should always be treated as unknown rather than error (although errno will indicate the actual reason for the failure).
Looking at the FreeBSD man pages, I wonder if the corresponding information is available at all.
Finally:
Rather than try detecting the problematic case, I seriously recommend you fork() and exec() (or posix_spawn()) the slave processes, using only async-signal-safe functions (see man 7 signal) in the child process (prior to exec()), thus avoiding the fork()-thread complications. You can still create any shared memory segments, socket pairs, et cetera before forking(). The only drawback I can see is that you have to use separate binaries for the slave workers. Which, given your description of them, does not sound like a drawback to me.
If you send a SIGINFO signal to the process' controlling tty, the process should describe the status of threads. From the description it should be possible to deduce whether any threads have been created.
You may have to develop a small utility that is invoked via popen to read the output back into your library.
Added sample code Fri Dec 21 14:45
Run a simple program that creates five threads. Threads basically sleep. Before the program exits, send a SIGINFO signal to get the status of the threads.
openbsd> cat a.c
#include <unistd.h>
#include <pthread.h>
#define THREADS 5
void foo(void);
int
main()
{
pthread_t thr[THREADS];
int j;
for (j = 0; j < THREADS; j++) {
pthread_create(&thr[j], NULL, (void *)foo, NULL);
}
sleep(200);
return(0);
}
void
foo()
{
sleep(100);
}
openbsd> gcc a.c -pthread
openbsd> a.out &
[1] 1234
openbsd> kill -SIGINFO 1234
0x8bb0e000 sleep_wait 15 -c---W---f 0000
0x8bb0e800 sleep_wait 15 -c---W---f 0000
0x8bb0e400 sleep_wait 15 -c---W---f 0000
0x7cd3d800 sleep_wait 15 -c---W---f 0000
0x7cd3d400 sleep_wait 15 -c---W---f 0000
0x7cd3d000 sleep_wait 15 -c---W---f 0000 main
You could use use pthread_once() to ensure no other thread is doing the same thing: this way you don't have to care that multiple threads are calling your initialisation function, only one will be really executed.
Make your public initialisation function run a private initialisation through pthread_once().
static pthread_once_t my_initialisation_once = PTHREAD_ONCE_INIT;
static void my_initialisation(void)
{
/* do something */
}
int lib_initialisation(void)
{
pthread_once(&my_initialisation_conce, my_initialisation);
return 0;
}
Another example can be found here.
Links
How to protect init function of a C based library?
POSIX threads - do something only once
Related
I was trying to implement a multi-threaded program using binary semaphore. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <semaphore.h>
int g = 0;
sem_t *semaphore;
void *myThreadFun(void *vargp)
{
int myid = (int)vargp;
static int s = 0;
sem_wait(semaphore);
++s; ++g;
printf("Thread ID: %d, Static: %d, Global: %d\n", myid, s, g);
fflush(stdout);
sem_post(semaphore);
pthread_exit(0);
}
int main()
{
int i;
pthread_t tid;
if ((semaphore = sem_open("/semaphore", O_CREAT, 0644, 3))==SEM_FAILED) {
printf("semaphore initialization failed\n");
}
for (i = 0; i < 3; i++) {
pthread_create(&tid, NULL, myThreadFun, (void *)i);
}
pthread_exit(NULL);
return 0;
}
Now, when I opened the sempahore, I made the count 3. I was expecting that this wouldnt work, and I would get race condition, because each thread is now capable of decrementing the count.
Is there something wrong with the implementation? Also, if I make the count 0 during sem_open, wouldnt that initiate a deadlock condition, because all the threads should be blocked on sem_wait.
Now, when I opened the sempahore, I made the count 3. I was expecting that this wouldnt work, and I would get race condition, because each thread is now capable of decrementing the count.
And how do you judge that there isn't any race? Observing output consistent with what you could rely upon in the absence of a data race in no way proves that there isn't a data race. It merely fails provide any evidence of one.
However, you seem to be suggesting that there would be a data race inherent in more than one thread concurrently performing a sem_wait() on a semaphore whose value is initially greater than 1 (otherwise which counter are you talking about?). But that's utter nonsense. You're talking about a semaphore. It's a synchronization object. Such objects and the functions that manipulate them are the basis for thread synchronization. They themselves are either completely thread safe or terminally buggy.
Now, you are correct that you open the semaphore with an initial count sufficient to avoid any of your threads blocking in sem_wait(), and that therefore they can all run concurrently in the whole body of myThreadFun(). You have not established, however, that they in fact do run concurrently. There are several reasons why they might not do. If they do run concurrently, then the incrementing of shared variables s and g is indeed of concern, but again, even if you see no signs of a data race, that doesn't mean there isn't one.
Everything else aside, the fact that your threads all call sem_wait(), sem_post(), and printf() induces some synchronization in the form of memory barriers, which would reduce the likelihood of observing anomalous effects on s and g. sem_wait() and sem_post() must contain memory barriers in order to function correctly, regardless of the semaphore's current count. printf() calls are required to use locking to protect the state of the stream from corruption in multi-threaded programs, and it is reasonable to suppose that this will require a memory barrier.
Is there something wrong with the implementation?
Yes. It is not properly synchronized. Initialize the semaphore with count 1 so that the modifications of s and g occur only while exactly one thread has the semaphore locked.
Also, if I make the count 0 during sem_open, wouldnt that initiate a deadlock condition, because all the threads should be blocked on sem_wait.
If the semaphore has count 0 before any of the additional threads are started, then yes. It is therefore inappropriate to open the semaphore with count 0, unless you also subsequently post to it before starting the threads. But you are using a named semaphore. These persist until removed, and you never remove it. The count you specify to sem_open() has no effect unless a new semaphore needs to be created; when an existing semaphore is opened, its count is unchanged.
Also, do have the main thread join all the others before it terminates. It's not inherently wrong not to do so, but in most cases it's required for the semantics you want.
I'll get to the code in a bit that proves you had a race condition. I'll add a couple of different ways to trigger it so you can see how this works. I'm doing this on Linux and passing -std=gnu99 as a param to gcc ie
gcc -Wall -pedantic -lpthread -std=gnu99 semaphore.c -o semtex
Note. In your original example (assuming Linux) one fatal mistake you had was not removing the semaphore. If you run the following command you might see some of these sitting around on your machine
ls -la /dev/shm/sem.*
You need to make sure that there are no old semaphore sitting on the filesystem before running your program or you end up picking up the last settings from the old semaphore. You need to use sem_unlink to clean up.
To run it please use the following.
rm /dev/shm/sem.semtex;./semtex
I'm deliberately making sure that the semaphore is not there before running because if you have a DEADLOCK it can be left around and that causes all sorts of problems when testing.
Now, when I opened the sempahore, I made the count 3. I was expecting
that this wouldn't work, and I would get race condition, because each
thread is now capable of decrementing the count.
You had a race condition but sometimes C is so damned fast things can appear to work because your program can get everything it needs done in the time the OS has allocated to the thread ie the OS didn't preempt it at an important point.
This is one of those cases, the race condition is there you just need to squint a little bit to see it. In the following code you can tweak some parameters to see the a deadlock, correct usage and Undefined Behavior.
#define INITIAL_SEMAPHORE_VALUE CORRECT
The value INITIAL_SEMAPHORE_VALUE can take three values...
#define DEADLOCK 0
#define CORRECT 1
#define INCORRECT 2
I hope they're self explanatory. You can also use two methods to cause the race condition to blow up the program.
#define METHOD sleep
Set the METHOD to spin and you can play with the SPIN_COUNT and find out how many times the loop can run before you actually see a problem, this is C, it can get a lot done before it gets preempted. The code has most of the rest of the information you need in it.
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
#define DEADLOCK 0 // DEADLOCK
#define CORRECT 1 // CORRECT
#define INCORRECT 2 // INCORRECT
/*
* Change the following values to observe what happen.
*/
#define INITIAL_SEMAPHORE_VALUE CORRECT
//The next value provides to two different ways to trigger the problem, one
//using a tight loop and the other using a system call.
#define METHOD sleep
#if (METHOD == spin)
/* You need to increase the SPIN_COUNT to a value that's big enough that the
* kernel preempts the thread to see it fail. The value set here worked for me
* in a VM but might not work for you, tweak it. */
#define SPIN_COUNT 1000000
#else
/* The reason we can use such a small time for USLEEP is because we're making
* the kernel preempt the thread by using a system call.*/
#define USLEEP_TIME 1
#endif
#define TOT_THREADS 10
static int g = 0;
static int ret = 1729;
sem_t *semaphore;
void *myThreadFun(void *vargp) {
int myid = (int)vargp;
int w = 0;
static int s = 0;
if((w = sem_wait(semaphore)) != 0) {
fprintf(stderr, "Error: %s\n", strerror(errno));
abort();
};
/* This is the interesting part... Between updating `s` and `g` we add
* a delay using one of two methods. */
s++;
#if ( METHOD == spin )
int spin = 0;
while(spin < SPIN_COUNT) {
spin++;
}
#else
usleep(USLEEP_TIME);
#endif
g++;
if(s != g) {
fprintf(stderr, "Fatal Error: s != g in thread: %d, s: %d, g: %d\n", myid, s, g);
abort();
}
printf("Thread ID: %d, Static: %d, Global: %d\n", myid, s, g);
// It's a false sense of security if you think the assert will fail on a race
// condition when you get the params to sem_open wrong It might not be
// detected.
assert(s == g);
if((w = sem_post(semaphore)) != 0) {
fprintf(stderr, "Error: %s\n", strerror(errno));
abort();
};
return &ret;
}
int main(void){
int i;
void *status;
const char *semaphore_name = "semtex";
pthread_t tids[TOT_THREADS];
if((semaphore = sem_open(semaphore_name, O_CREAT, 0644, INITIAL_SEMAPHORE_VALUE)) == SEM_FAILED) {
fprintf(stderr, "Fatal Error: %s\n", strerror(errno));
abort();
}
for (i = 0; i < TOT_THREADS; i++) {
pthread_create(&tids[i], NULL, myThreadFun, (void *) (intptr_t) i);
}
for (i = 0; i < TOT_THREADS; i++) {
pthread_join(tids[i], &status);
assert(*(int*)status == 1729);
}
/*The following line was missing from your original code*/
sem_unlink(semaphore_name);
pthread_exit(0);
}
Calling tzset() after forking appears to be very slow. I only see the slowness if I first call tzset() in the parent process before forking. My TZ environment variable is not set. I dtruss'd my test program and it revealed the child process reads /etc/localtime for every tzset() invocation, while the parent process only reads it once. This file access seems to be the source of the slowness, but I wasn't able to determine why it's accessing it every time in the child process.
Here is my test program foo.c:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>
void check(char *msg);
int main(int argc, char **argv) {
check("before");
pid_t c = fork();
if (c == 0) {
check("fork");
exit(0);
}
wait(NULL);
check("after");
}
void check(char *msg) {
struct timeval tv;
gettimeofday(&tv, NULL);
time_t start = tv.tv_sec;
suseconds_t mstart = tv.tv_usec;
for (int i = 0; i < 10000; i++) {
tzset();
}
gettimeofday(&tv, NULL);
double delta = (double)(tv.tv_sec - start);
delta += (double)(tv.tv_usec - mstart)/1000000.0;
printf("%s took: %fs\n", msg, delta);
}
I compiled and executed foo.c like this:
[muir#muir-work-mb scratch]$ clang -o foo foo.c
[muir#muir-work-mb scratch]$ env -i ./foo
before took: 0.002135s
fork took: 1.122254s
after took: 0.001120s
I'm running Mac OS X 10.10.1 (also reproduced on 10.9.5).
I originally noticed the slowness via ruby (Time#localtime slow in child process).
Ken Thomases's response may be correct, but I was curious about a more specific answer because I still find the slowness unexpected behavior for a single-threaded program performing such a simple/common operation after forking. After examining http://opensource.apple.com/source/Libc/Libc-997.1.1/stdtime/FreeBSD/localtime.c (not 100% sure this is the correct source), I think I have an answer.
The code uses passive notifications to determine if the time zone has changed (as opposed to stating /etc/localtime every time). It appears that the registered notification token becomes invalid in the child process after forking. Furthermore, the code treats the error from using an invalid token as a positive notification that the timezone has changed, and proceeds to read /etc/localtime every time. I guess this is the kind of undefined behavior you can get after forking? It would be nice if the library noticed the error and re-registered for the notification, though.
Here is the snippet of code from localtime.c that mixes the error value with the status value:
nstat = notify_check(p->token, &ncheck);
if (nstat || ncheck) {
I demonstrated that the registration token becomes invalid after fork using this program:
#include <notify.h>
#include <stdio.h>
#include <stdlib.h>
void bail(char *msg) {
printf("Error: %s\n", msg);
exit(1);
}
int main(int argc, char **argv) {
int token, something_changed, ret;
notify_register_check("com.apple.system.timezone", &token);
ret = notify_check(token, &something_changed);
if (ret)
bail("notify_check #1 failed");
if (!something_changed)
bail("expected change on first call");
ret = notify_check(token, &something_changed);
if (ret)
bail("notify_check #2 failed");
if (something_changed)
bail("expected no change");
pid_t c = fork();
if (c == 0) {
ret = notify_check(token, &something_changed);
if (ret) {
if (ret == NOTIFY_STATUS_INVALID_TOKEN)
printf("ret is invalid token\n");
if (!notify_is_valid_token(token))
printf("token is not valid\n");
bail("notify_check in fork failed");
}
if (something_changed)
bail("expected not changed");
exit(0);
}
wait(NULL);
}
And ran it like this:
muir-mb:projects muir$ clang -o notify_test notify_test.c
muir-mb:projects muir$ ./notify_test
ret is invalid token
token is not valid
Error: notify_check in fork failed
You're lucky you didn't experience nasal demons!
POSIX states that only async-signal-safe functions are legal to call in the child process after the fork() and before a call to an exec*() function. From the standard (emphasis added):
… the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
…
There are two reasons why POSIX programmers call fork(). One reason is
to create a new thread of control within the same program (which was
originally only possible in POSIX by creating a new process); the
other is to create a new process running a different program. In the
latter case, the call to fork() is soon followed by a call to one of
the exec functions.
The general problem with making fork() work in a multi-threaded world
is what to do with all of the threads. There are two alternatives. One
is to copy all of the threads into the new process. This causes the
programmer or implementation to deal with threads that are suspended
on system calls or that might be about to execute system calls that
should not be executed in the new process. The other alternative is to
copy only the thread that calls fork(). This creates the difficulty
that the state of process-local resources is usually held in process
memory. If a thread that is not calling fork() holds a resource, that
resource is never released in the child process because the thread
whose job it is to release the resource does not exist in the child
process.
When a programmer is writing a multi-threaded program, the first
described use of fork(), creating new threads in the same program, is
provided by the pthread_create() function. The fork() function is thus
used only to run new programs, and the effects of calling functions
that require certain resources between the call to fork() and the call
to an exec function are undefined.
There are lists of async-signal-safe functions here and here. For any other function, if it's not specifically documented that the implementations on the platforms to which you're deploying add a non-standard safety guarantee, then you must consider it unsafe and its behavior on the child side of a fork() to be undefined.
I'm running the following program. It simply creates threads that die straight away.
What I have found is that after 93 to 98 (it varies slightly) successful calls, every next call to pthread_create() fails with error 11: Resource temporarily unavailable. I think I'm closing the thread correctly so it should give up on any resources it has but some resources become unavailable.
The first parameter of the program allows me to set the interval between calls to pthread_create() but testing with different values, I've learned that the interval does not matter (well, I'll get the error earlier): the number of successful calls will be the same.
The second parameter of the program allows me to set a sleep interval after a failed call but the length of the interval does not seem to make any difference.
Which ceiling am I hitting here?
EDIT: found the error in doSomething(): change lock to unlock and the program runs fine. The question remains: what resource is depleted with the error uncorrected?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <pthread.h>
#include <errno.h>
pthread_mutex_t doSomethingLock;
void milliSleep(unsigned int milliSeconds)
{
struct timespec ts;
ts.tv_sec = floorf(((float)milliSeconds / 1000));
ts.tv_nsec = ((((float)milliSeconds / 1000) - ts.tv_sec)) * 1000000000;
nanosleep(&ts, NULL);
}
void *doSomething(void *args)
{
pthread_detach(pthread_self());
pthread_mutex_lock(&doSomethingLock);
pthread_exit(NULL);
}
int main(int argc, char **argv)
{
pthread_t doSomethingThread;
pthread_mutexattr_t attr;
int threadsCreated = 0;
if (argc != 3)
{
fprintf(stderr, "usage: demo <interval between pthread_create() in ms> <time to wait after fail in ms>\n");
exit(1);
}
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE_NP);
pthread_mutex_init(&doSomethingLock, &attr);
while (1)
{
pthread_mutex_lock(&doSomethingLock);
if (pthread_create(&doSomethingThread, NULL, doSomething, NULL) != 0)
{
fprintf(stderr, "%d pthread_create(): error %d, %m\n", threadsCreated, errno);
milliSleep(atoi(argv[2]));
}
else threadsCreated++;
milliSleep(atoi(argv[1]));
}
}
If you are on a 32 bit distro, you are probably hitting address space limits. The last I checked, glibc will allocate about 13MB for stack space in every thread created (this is just the size of the mapping, not allocated memory). With 98 threads, you will be pushing past a gigabyte of address space of the 3G available.
You can test this by freezing your process after the error (e.g. sleep(1000000) or whatever) and looking at its address space with pmap.
If that is the problem, then try setting a smaller stack size with pthread_attr_setstack() on the pthread_attr_t you pass to pthread_create. You will have to be the judge of your stack requirements obviously, but often even complicated code can run successfully in only a few kilobytes of stack.
Your program does not "create threads that simply die away". It does not do what you think it does.
First, pthread_mutex_unlock() only unlocks a pthread_mutex_t that has been locked by the same thread. This is how mutexes work: they can only be unlocked by the same thread that locked them. If you want the behaviour of a semaphore semaphore, use a semaphore.
Your example code creates a recursive mutex, which the doSomething() function tries to lock. Because it is held by the original thread, it blocks (waits for the mutex to become free in the pthread_mutex_lock() call). Because the original thread never releases the lock, you just pile up new threads on top of the doSomethingLock mutex.
Recursivity with respect to mutexes just means a thread can lock it more than once; it must unlock it the same number of times for the mutex to be actually released.
If you change the pthread_mutex_lock() in doSomething() to pthread_mutex_unlock(), then you're trying to unlock a mutex not held by that thread. The call fails, and then the threads do die immediately.
Assuming you fix your program, you'll next find that you cannot create more than a hundred or so threads (depending on your system and available RAM).
The reason is well explained by Andy Ross: the fixed size stacks (getrlimit(RLIMIT_STACK, (struct rlimit *)&info) tells you how much, unless you set it via thread attributes) eat up your available address space.
The original stack given to the process is resized automatically, but for all other threads, the stack size is fixed. By default, it is very large; on my system, 8388608 bytes (8 megabytes).
I personally create threads with very small stacks, usually 65536 bytes, which is more than enough unless your functions use local arrays or large structures, or do insanely deep recursion:
#ifndef THREAD_STACK_SIZE
#define THREAD_STACK_SIZE 65536
#endif
pthread_attr_t attrs;
pthread_t thread[N];
int i, result;
/* Create a thread attribute for the desired stack size. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, THREAD_STACK_SIZE);
/* Create any number of threads.
* The attributes are only a guide to pthread_create(),
* they are not "consumed" by the call. */
for (i = 0; i < N; i++) {
result = pthread_create(&thread[i], &attrs, some_func, (void *)i);
if (result) {
/* strerror(result) describes the error */
break;
}
}
/* You should destroy the attributes when you know
* you won't be creating any further threads anymore. */
pthread_attr_destroy(&attrs);
The minimum stack size should be available as PTHREAD_STACK_MIN, and should be a multiple of sysconf(_SC_PAGESIZE). Currently PTHREAD_STACK_MIN == 16384, but I recommend using a larger power of two. (Page sizes are always powers of two on any binary architecture.)
It is only the minimum, and the pthread library is free to use any larger value it sees fit, but in practice the stack size seems to be what you set it to, plus a fixed value depending on the architecture, kernel, and the pthread library version. Using a compile-time constant works well for almost all cases, but if your application is complex enough to have a configuration file, it might be a good idea to let the user override the compile-time constant if they want to, in the config file.
I've found that on Linux, by making my own call to the rt_sigqueue syscall, I can put whatever I like in the si_uid and si_pid fields and the call succeeds and happily delivers the incorrect values. Naturally the uid restrictions on sending signals provide some protection against this kind of spoofing, but I'm worried it may be dangerous to rely on this information. Is there any good documentation on the topic I could read? Why does Linux allow the obviously-incorrect behavior of letting the caller specify the siginfo parameters rather than generating them in kernelspace? It seems nonsensical, especially since extra sys
calls (and thus performance cost) may be required in order to get the uid/gid in userspace.
Edit: Based on my reading of POSIX (emphasis added by me):
If si_code is SI_USER or SI_QUEUE, [XSI] or any value less than or equal to 0, then the signal was generated by a process and si_pid and si_uid shall be set to the process ID and the real user ID of the sender, respectively.
I believe this behavior by Linux is non-conformant and a serious bug.
That section of the POSIX page you quote also lists what si-code means, and here's the meaning:
SI_QUEUE
The signal was sent by the sigqueue() function.
That section goes on to say:
If the signal was not generated by one
of the functions or events listed
above, si_code shall be set either
to one of the signal-specific values
described in XBD , or to an
implementation-defined value that is
not equal to any of the values defined
above.
Nothing is violated if only the sigqueue() function uses SI_QUEUE. Your scenario involves code other than the sigqueue() function using SI_QUEUE The question is whether POSIX envisions an operating system enforcing that only a specified library function (as opposed to some function which is not a POSIX-defined library function) be permitted to make a system call with certain characteristics. I believe the answer is "no".
EDIT as of 2011-03-26, 14:00 PST:
This edit is in response to R..'s comment from eight hours ago, since the page wouldn't let me leave an adequately voluminous comment:
I think you're basically right. But either a system is POSIX compliant or it is not. If a non-library function does a syscall which results in a non-compliant combination of uid, pid, and 'si_code', then the second statement I quoted makes it clear that the call itself is not compliant. One can interpret this in two ways. One ways is: "If a user breaks this rule, then he makes the system non-compliant." But you're right, I think that's silly. What good is a system when any nonprivileged user can make it noncompliant? The fix, as I see it, is somehow to have the system know that it's not the library 'sigqueue()' making the system call, then the kernel itself should set 'si_code' to something other than 'SI_QUEUE', and leave the uid and pid as you set them. In my opinion, you should raise this with the kernel folks. They may have difficulty, however; I don't know of any secure way for them to detect whether a syscall is made by a particular library function, seeing as how the library functions. almost by definition, are merely convenience wrappers around the syscalls. And that may be the position they take, which I know will be a disappointment.
(voluminous) EDIT as of 2011-03-26, 18:00 PST:
Again because of limitations on comment length.
This is in response to R..'s comment of about an hour ago.
I'm a little new to the syscall subject, so please bear with me.
By "the kernel sysqueue syscall", do you mean the `__NR_rt_sigqueueinfo' call? That's the only one that I found when I did this:
grep -Ri 'NR.*queue' /usr/include
If that's the case, I think I'm not understanding your original point. The kernel will let (non-root) me use SI-QUEUE with a faked pid and uid without error. If I have the sending side coded thus:
#include <sys/syscall.h>
#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc,
char **argv
)
{
long john_silver;
siginfo_t my_siginfo;
if(argc!=2)
{
fprintf(stderr,"missing pid argument\n");
exit(1);
}
john_silver=strtol(argv[1],NULL,0);
if(kill(john_silver,SIGUSR1))
{
fprintf(stderr,"kill() fail\n");
exit(1);
}
sleep(1);
my_siginfo.si_signo=SIGUSR1;
my_siginfo.si_code=SI_QUEUE;
my_siginfo.si_pid=getpid();
my_siginfo.si_uid=getuid();
my_siginfo.si_value.sival_int=41;
if(syscall(__NR_rt_sigqueueinfo,john_silver,SIGUSR1,&my_siginfo))
{
perror("syscall()");
exit(1);
}
sleep(1);
my_siginfo.si_signo=SIGUSR2;
my_siginfo.si_code=SI_QUEUE;
my_siginfo.si_pid=getpid()+1;
my_siginfo.si_uid=getuid()+1;
my_siginfo.si_value.sival_int=42;
if(syscall(__NR_rt_sigqueueinfo,john_silver,SIGUSR2,&my_siginfo))
{
perror("syscall()");
exit(1);
}
return 0;
} /* main() */
and the receiving side coded thus:
#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int signaled_flag=0;
siginfo_t received_information;
void
my_handler(int signal_number,
siginfo_t *signal_information,
void *we_ignore_this
)
{
memmove(&received_information,
signal_information,
sizeof(received_information)
);
signaled_flag=1;
} /* my_handler() */
/*--------------------------------------------------------------------------*/
int
main(void)
{
pid_t myself;
struct sigaction the_action;
myself=getpid();
printf("signal receiver is process %d\n",myself);
the_action.sa_sigaction=my_handler;
sigemptyset(&the_action.sa_mask);
the_action.sa_flags=SA_SIGINFO;
if(sigaction(SIGUSR1,&the_action,NULL))
{
fprintf(stderr,"sigaction(SIGUSR1) fail\n");
exit(1);
}
if(sigaction(SIGUSR2,&the_action,NULL))
{
fprintf(stderr,"sigaction(SIGUSR2) fail\n");
exit(1);
}
for(;;)
{
while(!signaled_flag)
{
sleep(1);
}
printf("si_signo: %d\n",received_information.si_signo);
printf("si_pid : %d\n",received_information.si_pid );
printf("si_uid : %d\n",received_information.si_uid );
if(received_information.si_signo==SIGUSR2)
{
break;
}
signaled_flag=0;
}
return 0;
} /* main() */
I can then run (non-root) the receiving side thus:
wally:~/tmp/20110326$ receive
signal receiver is process 9023
si_signo: 10
si_pid : 9055
si_uid : 4000
si_signo: 10
si_pid : 9055
si_uid : 4000
si_signo: 12
si_pid : 9056
si_uid : 4001
wally:~/tmp/20110326$
And see this (non-root) on the send end:
wally:~/tmp/20110326$ send 9023
wally:~/tmp/20110326$
As you can see, the third event has spoofed pid and uid. Isn't that what you originally objected to? There's no EINVAL or EPERM in sight. I guess I'm confused.
I agree that si_uid and si_pid should be trustworthy, and if they are not it is a bug. However, this is only required if the signal is SIGCHLD generated by a state change of a child process, or if si_code is SI_USER or SI_QUEUE, or if the system supports the XSI option and si_code <= 0. Linux/glibc also pass si_uid and si_pid values in other cases; these are often not trustworthy but that is not a POSIX conformance issue.
Of course, for kill() the signal may not be queued in which case the siginfo_t does not provide any additional information.
The reason that rt_sigqueueinfo allows more than just SI_QUEUE is probably to allow implementing POSIX asynchronous I/O, message queues and per-process timers with minimal kernel support. Implementing these in userland requires the ability to send a signal with SI_ASYNCIO, SI_MESGQ and SI_TIMER respectively. I do not know how glibc allocates the resources to queue the signal beforehand; to me it looks like it does not and simply hopes rt_sigqueueinfo does not fail. POSIX clearly forbids discarding a timer expiration (async I/O completion, message arrival on a message queue) notification because too many signals are queued at the time of the expiration; the implementation should have rejected the creation or registration if there were insufficient resources. The objects have been defined carefully such that each I/O request, message queue or timer can have at most one signal in flight at a time.
For some reason I thought that calling pthread_exit(NULL) at the end of a main function would guarantee that all running threads (at least created in the main function) would finish running before main could exit. However when I run this code below without calling the two pthread_join functions (at the end of main) explicitly I get a segmentation fault, which seems to happen because the main function has been exited before the two threads finish their job, and therefore the char buffer is not available anymore. However when I include these two pthread_join function calls at the end of main it runs as it should. To guarantee that main will not exit before all running threads have finished, is it necessary to call pthread_join explicitly for all threads initialized directly in main?
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <assert.h>
#include <semaphore.h>
#define NUM_CHAR 1024
#define BUFFER_SIZE 8
typedef struct {
pthread_mutex_t mutex;
sem_t full;
sem_t empty;
char* buffer;
} Context;
void *Reader(void* arg) {
Context* context = (Context*) arg;
for (int i = 0; i < NUM_CHAR; ++i) {
sem_wait(&context->full);
pthread_mutex_lock(&(context->mutex));
char c = context->buffer[i % BUFFER_SIZE];
pthread_mutex_unlock(&(context->mutex));
sem_post(&context->empty);
printf("%c", c);
}
printf("\n");
return NULL;
}
void *Writer(void* arg) {
Context* context = (Context*) arg;
for (int i = 0; i < NUM_CHAR; ++i) {
sem_wait(&context->empty);
pthread_mutex_lock(&(context->mutex));
context->buffer[i % BUFFER_SIZE] = 'a' + (rand() % 26);
float ranFloat = (float) rand() / RAND_MAX;
if (ranFloat < 0.5) sleep(0.2);
pthread_mutex_unlock(&(context->mutex));
sem_post(&context->full);
}
return NULL;
}
int main() {
char buffer[BUFFER_SIZE];
pthread_t reader, writer;
Context context;
srand(time(NULL));
int status = 0;
status = pthread_mutex_init(&context.mutex, NULL);
status = sem_init(&context.full,0,0);
status = sem_init(&context.empty,0, BUFFER_SIZE);
context.buffer = buffer;
status = pthread_create(&reader, NULL, Reader, &context);
status = pthread_create(&writer, NULL, Writer, &context);
pthread_join(reader,NULL); // This line seems to be necessary
pthread_join(writer,NULL); // This line seems to be necessary
pthread_exit(NULL);
return 0;
}
If that is the case, how could I handle the case where plenty of identical threads (like in the code below) would be created using the same thread identifier? In that case, how can I make sure that all the threads will have finished before main exits? Do I really have to keep an array of NUM_STUDENTS pthread_t identifiers to be able to do this? I guess I could do this by letting the Student threads signal a semaphore and then let the main function wait on that semaphore, but is there really no easier way to do this?
int main()
{
pthread_t thread;
for (int i = 0; i < NUM_STUDENTS; i++)
pthread_create(&thread,NULL,Student,NULL); // Threads
// Make sure that all student threads have finished
exit(0);
}
pthread_exit() is a function called by a thread to terminate its own execution. For the situation you've given it is not to be called from your main program thread.
As you have figured out, pthread_join() is the correct means to wait for the completion of a joinable thread from main().
Also as you've figured out, you need to maintain the value returned from pthread_create() to pass to pthread_join().
What this means is that you cannot use the same pthread_t variable for all the threads you create if you intend to use pthread_join().
Rather, build an array of pthread_t so that you have a copy of each thread's ID.
Quite aside from whether the program should or should not terminate when the main thread calls pthread_exit, pthread_exit says
The pthread_exit() function terminates
the calling thread
And also:
After a thread has terminated, the
result of access to local (auto)
variables of the thread is undefined.
Since the context is an automatic variable of main(), your code can fall over before it even gets to the point of testing what you want it to test...
A mini saga
You don't mention the environment in which you are running the original code. I modified your code to use nanosleep() (since, as I mentioned in a comment to the question, sleep() takes an integer and therefore sleep(0.2) is equivalent to sleep(0)), and compiled the program on MacOS X 10.6.4.
Without error checking
It works fine; it took about 100 seconds to run with the 0.5 probability factor (as you'd expect; I changed that to 0.05 to reduce the runtime to about 10 seconds), and generated a random string - some of the time.
Sometimes I got nothing, sometimes I got more and sometimes I got less data. But I didn't see a core dump (not even with 'ulimit -c unlimited' to allow arbitrarily large core dumps).
Eventually, I applied some tools and got to see that I always got 1025 characters (1024 generated plus a newline), but quite often, I got 1024 ASCII NUL characters. Sometimes they'd appear in the middle, sometimes at the beginning, etc:
$ ./pth | tpipe -s "vis | ww -w64" "wc -c"
1025
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\000ocriexffwgdvdvyfitjtvlzcoffhusjo
zyacniffpsfswesgrkuxycsubufamxxzkrkqnwvsxcbmktodessyohixsmuhdovt
hhertqjjinzoptcuqzertybicrzaeyqlyublbfgutcdvftwkuvxhouiuduoqrftw
xjkgqutpryelzuaerpsbotwyskaflwofseibfqntecyseufqxvzikcyeeikjzsye
qxhjwrjmunntjwhohqovpwcktolcwrvmfvdfsmkvkrptjvslivbfjqpwgvroafzn
fkjumqxjbarelbrdijfrjbtiwnajeqgnobjbksulvcobjkzwwifpvpmpwyzpwiyi
cdpwalenxmocmtdluzouqemmjdktjtvfqwbityzmronwvulfizpizkiuzapftxay
obwsfajcicvcrrjehjeyzsngrwusbejiovaaatyzouktetcerqxjsdpswixjpege
blxscdebfsptxwvwsllvydipovzmnrvoiopmqotydqaujwdykidmwzitdsropguv
vudyfiaaaqueyllnwudfpplcfbsngqqeyucdawqxqzczuwsnaquofreilzvdwbjq
ksrouwltvaktpdrvjnqahpdqdshmmvntspglexggshqbjrvxceaqlfnukedxzlms
cnapdtgtcoyhnglojbjnplowericrzbfulvrobfn
$
(The 'tpipe' program is like 'tee' but it writes to pipes instead of files (and to standard output unless you specify the '-s' option); 'vis' comes from 'The UNIX Programming Environment' by Kernighan & Pike; 'ww' is a 'word wrapper' but there aren't any words here so it brute force wraps at width 64.)
The behaviour I was seeing was highly indeterminate - I'd get different results on each run. I even replaced the random characters with the alphabet in sequence ('a' + i % 26), and was still getting odd behaviour.
I added some debug printing code (and a counter to the contex), and it was clear that the semaphore context->full was not working properly for the reader - it was being allowed to go into the mutual exclusion before the writer had written anything.
With error checking
When I added error checking to the mutex and semaphore operations, I found that:
sem_init(&context.full) failed (-1)
errno = 78 (Function not implemented)
So, the weird outputs are because MacOS X does not implement sem_init(). It's odd; the sem_wait() function failed with errno = 9 (EBADF 'Bad file descriptor'); I added the checks there first. Then I checked the initialization...
Using sem_open() instead of sem_init()
The sem_open() calls succeed, which looks good (names "/full.sem" and "/empty.sem", flags O_CREAT, mode values of 0444, 0600, 0700 at different times, and initial values 0 and BUFFER_SIZE, as with sem_init()). Unfortunately, the first sem_wait() or sem_post() operation fails with errno = 9 (EBADF 'Bad file descriptor') again.
Morals
It is important to check error conditions from system calls.
The output I see is non-deterministic because the semaphores don't work.
That doesn't alter the 'it does not crash without the pthread_join() calls' behaviour.
MacOS X does not have a working POSIX semaphore implementation.
There is no need for calling pthread_join(reader,NULL); at all if Context and buffer are declared with static storage duration (as already pointed out by Steve Jessop, caf and David Schwartz).
Declaring Context and buffer static also makes it necessary to change Context *context to Context *contextr or Context *contextw respectively.
In addition, the following rewrite called pthread_exit.c replaces sem_init() with sem_open() and uses nanosleep() (as suggested by Jonathan Leffler).
pthread_exit was tested on Mac OS X 10.6.8 and did not output any ASCII NUL characters.
/*
cat pthread_exit.c (sample code to test pthread_exit() in main())
source:
"pthreads in C - pthread_exit",
http://stackoverflow.com/questions/3330048/pthreads-in-c-pthread-exit
compiled on Mac OS X 10.6.8 with:
gcc -ansi -pedantic -std=gnu99 -Os -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-qual -Wstrict-prototypes \
-Wmissing-prototypes -Wformat=2 -l pthread -o pthread_exit pthread_exit.c
test with:
time -p bash -c './pthread_exit | tee >(od -c 1>&2) | wc -c'
*/
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <assert.h>
#include <semaphore.h>
#include <time.h>
void *Reader(void* arg);
void *Writer(void* arg);
// #define NUM_CHAR 1024
#define NUM_CHAR 100
#define BUFFER_SIZE 8
typedef struct {
pthread_mutex_t mutex;
sem_t *full;
sem_t *empty;
const char *semname1;
const char *semname2;
char* buffer;
} Context;
static char buffer[BUFFER_SIZE];
static Context context;
void *Reader(void* arg) {
Context *contextr = (Context*) arg;
for (int i = 0; i < NUM_CHAR; ++i) {
sem_wait(contextr->full);
pthread_mutex_lock(&(contextr->mutex));
char c = contextr->buffer[i % BUFFER_SIZE];
pthread_mutex_unlock(&(contextr->mutex));
sem_post(contextr->empty);
printf("%c", c);
}
printf("\n");
return NULL;
}
void *Writer(void* arg) {
Context *contextw = (Context*) arg;
for (int i = 0; i < NUM_CHAR; ++i) {
sem_wait(contextw->empty);
pthread_mutex_lock(&(contextw->mutex));
contextw->buffer[i % BUFFER_SIZE] = 'a' + (rand() % 26);
float ranFloat = (float) rand() / RAND_MAX;
//if (ranFloat < 0.5) sleep(0.2);
if (ranFloat < 0.5)
nanosleep((struct timespec[]){{0, 200000000L}}, NULL);
pthread_mutex_unlock(&(contextw->mutex));
sem_post(contextw->full);
}
return NULL;
}
int main(void) {
pthread_t reader, writer;
srand(time(NULL));
int status = 0;
status = pthread_mutex_init(&context.mutex, NULL);
context.semname1 = "Semaphore1";
context.semname2 = "Semaphore2";
context.full = sem_open(context.semname1, O_CREAT, 0777, 0);
if (context.full == SEM_FAILED)
{
fprintf(stderr, "%s\n", "ERROR creating semaphore semname1");
exit(EXIT_FAILURE);
}
context.empty = sem_open(context.semname2, O_CREAT, 0777, BUFFER_SIZE);
if (context.empty == SEM_FAILED)
{
fprintf(stderr, "%s\n", "ERROR creating semaphore semname2");
exit(EXIT_FAILURE);
}
context.buffer = buffer;
status = pthread_create(&reader, NULL, Reader, &context);
status = pthread_create(&writer, NULL, Writer, &context);
// pthread_join(reader,NULL); // This line seems to be necessary
// pthread_join(writer,NULL); // This line seems to be necessary
sem_unlink(context.semname1);
sem_unlink(context.semname2);
pthread_exit(NULL);
return 0;
}
pthread_join() is the standard way to wait for the other thread to complete, I would stick to that.
Alternatively, you can create a thread counter and have all child threads increment it by 1 at start, then decrement it by 1 when they finish (with proper locking of course), then have your main() wait for this counter to hit 0. (pthread_cond_wait() would be my choice).
Per normal pthread semantics, as taught e.g. here, your original idea does seem to be confirmed:
If main() finishes before the threads
it has created, and exits with
pthread_exit(), the other threads will
continue to execute. Otherwise, they
will be automatically terminated when
main() finishes.
However I'm not sure whether that's part of the POSIX threads standard or just a common but not universal "nice to have" add-on tidbit (I do know that some implementations don't respect this constraint -- I just don't know whether those implementations are nevertheless to be considered standard compliant!-). So I'll have to join the prudent chorus recommending the joining of every thread you need to terminate, just to be on the safe side -- or, as Jon Postel put it in the context of TCP/IP implementations:
Be conservative in what you send; be liberal in what you accept.
a "principle of robustness" that should be used way more broadly than just in TCP/IP;-).
pthread_exit(3) exits the thread that calls it (but not the whole process if other threads are still running). In your example other threads use variables on main's stack, thus when main's thread exits and its stack is destroyed they access unmapped memory, thus the segfault.
Use proper pthread_join(3) technique as suggested by others, or move shared variables into static storage.
When you pass a thread a pointer to a variable, you need to ensure that the lifetime of that variable is at least as long as the thread will attempt to access that variable. You pass the threads pointers to buffer and context, which are allocated on the stack inside main. As soon as main exits, those variables cease to exist. So you cannot exit from main until you confirm that those threads no longer need access to those pointers.
95% of the time, the fix for this problem is to follow this simple pattern:
1) Allocate an object to hold the parameters.
2) Fill in the object with the parameters.
3) Pass a pointer to the object to the new thread.
4) Allow the new thread to deallocate the object.
Sadly, this doesn't work well for objects shared by two or more threads. In that case, you can put a use count and a mutex inside the parameter object. Each thread can decrement the use count under protection of the mutex when it's done. The thread that drops the use count to zero frees the object.
You would need to do this for both buffer and context. Set the use count to 2 and then pass a pointer to this object to both threads.
pthread_join does the following :
The pthread_join() function suspends execution of the calling thread until the target thread terminates, unless the target thread has already terminated. On return from a successful pthread_join() call with a non-NULL value_ptr argument, the value passed to pthread_exit() by the terminating thread is made available in the location referenced by value_ptr. When a pthread_join() returns successfully, the target thread has been terminated. The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined. If the thread calling pthread_join() is canceled, then the target thread will not be detached.
However you can achieve the same by using a light weight loop which will prevent the exe from exiting. In Glib this is achieved by creating a GMainLoop, in Gtk+ you can use the gtk_main.
After completion of threads you have to quit the main loop or call gtk_exit.
Alternatively you can create you own wait functionality using a combination of sockets,pipes and select system call but this is not required and can be considered as an exercise for practice.