I'm running the following program. It simply creates threads that die straight away.
What I have found is that after 93 to 98 (it varies slightly) successful calls, every next call to pthread_create() fails with error 11: Resource temporarily unavailable. I think I'm closing the thread correctly so it should give up on any resources it has but some resources become unavailable.
The first parameter of the program allows me to set the interval between calls to pthread_create() but testing with different values, I've learned that the interval does not matter (well, I'll get the error earlier): the number of successful calls will be the same.
The second parameter of the program allows me to set a sleep interval after a failed call but the length of the interval does not seem to make any difference.
Which ceiling am I hitting here?
EDIT: found the error in doSomething(): change lock to unlock and the program runs fine. The question remains: what resource is depleted with the error uncorrected?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <pthread.h>
#include <errno.h>
pthread_mutex_t doSomethingLock;
void milliSleep(unsigned int milliSeconds)
{
struct timespec ts;
ts.tv_sec = floorf(((float)milliSeconds / 1000));
ts.tv_nsec = ((((float)milliSeconds / 1000) - ts.tv_sec)) * 1000000000;
nanosleep(&ts, NULL);
}
void *doSomething(void *args)
{
pthread_detach(pthread_self());
pthread_mutex_lock(&doSomethingLock);
pthread_exit(NULL);
}
int main(int argc, char **argv)
{
pthread_t doSomethingThread;
pthread_mutexattr_t attr;
int threadsCreated = 0;
if (argc != 3)
{
fprintf(stderr, "usage: demo <interval between pthread_create() in ms> <time to wait after fail in ms>\n");
exit(1);
}
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE_NP);
pthread_mutex_init(&doSomethingLock, &attr);
while (1)
{
pthread_mutex_lock(&doSomethingLock);
if (pthread_create(&doSomethingThread, NULL, doSomething, NULL) != 0)
{
fprintf(stderr, "%d pthread_create(): error %d, %m\n", threadsCreated, errno);
milliSleep(atoi(argv[2]));
}
else threadsCreated++;
milliSleep(atoi(argv[1]));
}
}
If you are on a 32 bit distro, you are probably hitting address space limits. The last I checked, glibc will allocate about 13MB for stack space in every thread created (this is just the size of the mapping, not allocated memory). With 98 threads, you will be pushing past a gigabyte of address space of the 3G available.
You can test this by freezing your process after the error (e.g. sleep(1000000) or whatever) and looking at its address space with pmap.
If that is the problem, then try setting a smaller stack size with pthread_attr_setstack() on the pthread_attr_t you pass to pthread_create. You will have to be the judge of your stack requirements obviously, but often even complicated code can run successfully in only a few kilobytes of stack.
Your program does not "create threads that simply die away". It does not do what you think it does.
First, pthread_mutex_unlock() only unlocks a pthread_mutex_t that has been locked by the same thread. This is how mutexes work: they can only be unlocked by the same thread that locked them. If you want the behaviour of a semaphore semaphore, use a semaphore.
Your example code creates a recursive mutex, which the doSomething() function tries to lock. Because it is held by the original thread, it blocks (waits for the mutex to become free in the pthread_mutex_lock() call). Because the original thread never releases the lock, you just pile up new threads on top of the doSomethingLock mutex.
Recursivity with respect to mutexes just means a thread can lock it more than once; it must unlock it the same number of times for the mutex to be actually released.
If you change the pthread_mutex_lock() in doSomething() to pthread_mutex_unlock(), then you're trying to unlock a mutex not held by that thread. The call fails, and then the threads do die immediately.
Assuming you fix your program, you'll next find that you cannot create more than a hundred or so threads (depending on your system and available RAM).
The reason is well explained by Andy Ross: the fixed size stacks (getrlimit(RLIMIT_STACK, (struct rlimit *)&info) tells you how much, unless you set it via thread attributes) eat up your available address space.
The original stack given to the process is resized automatically, but for all other threads, the stack size is fixed. By default, it is very large; on my system, 8388608 bytes (8 megabytes).
I personally create threads with very small stacks, usually 65536 bytes, which is more than enough unless your functions use local arrays or large structures, or do insanely deep recursion:
#ifndef THREAD_STACK_SIZE
#define THREAD_STACK_SIZE 65536
#endif
pthread_attr_t attrs;
pthread_t thread[N];
int i, result;
/* Create a thread attribute for the desired stack size. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, THREAD_STACK_SIZE);
/* Create any number of threads.
* The attributes are only a guide to pthread_create(),
* they are not "consumed" by the call. */
for (i = 0; i < N; i++) {
result = pthread_create(&thread[i], &attrs, some_func, (void *)i);
if (result) {
/* strerror(result) describes the error */
break;
}
}
/* You should destroy the attributes when you know
* you won't be creating any further threads anymore. */
pthread_attr_destroy(&attrs);
The minimum stack size should be available as PTHREAD_STACK_MIN, and should be a multiple of sysconf(_SC_PAGESIZE). Currently PTHREAD_STACK_MIN == 16384, but I recommend using a larger power of two. (Page sizes are always powers of two on any binary architecture.)
It is only the minimum, and the pthread library is free to use any larger value it sees fit, but in practice the stack size seems to be what you set it to, plus a fixed value depending on the architecture, kernel, and the pthread library version. Using a compile-time constant works well for almost all cases, but if your application is complex enough to have a configuration file, it might be a good idea to let the user override the compile-time constant if they want to, in the config file.
Related
This might be a dumb question, i'm very sorry if that's the case. But i'm struggling to take advantage of the multiple cores in my computer to perform multiple computations at the same time in my Quad-Core MacBook. This is not for any particular project, just a general question, since i want to learn for when i eventually do need to do this kind of things
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
I'm also aware of processed that can be created with fork, but i'm nor sure they are guaranteed to use more CPU, or if they, like threads, just help with IO-bound operations.
Finally i'm aware of CUDA, allowing paralellism in the GPU (And i think OpenCL and Compute Shaders also allows my code to run in the CPU in parallel) but i'm currently looking for something that will allow me to take advantage of the multiple CPU cores that my computer has.
In python, i'm aware of the multiprocessing module, which seems to provide an API very similar to threads, but there i do seem to gain an edge by running multiple functions performing computations in parallel. I'm looking into how could i get this same advantage in C, but i don't seem to be able
Any help pointing me to the right direction would be very much appreciated
Note: I'm trying to achive true parallelism, not concurrency
Note 2: I'm only aware of threads and using multiple processes in C, with threads i don't seem to be able to win the performance boost i want. And i'm not very familiar with processes, but i'm still not sure if running multiple processes is guaranteed to give me the advantage i'm looking for.
A simple program to heat up your CPU (100% utilization of all available cores).
Hint: The thread starting function does not return, program exit via [CTRL + C]
#include <pthread.h>
void* func(void *arg)
{
while (1);
}
int main()
{
#define NUM_THREADS 4 //use the number of cores (if known)
pthread_t threads[NUM_THREADS];
for (int i=0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, func, NULL);
for (int i=0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
return 0;
}
Compilation:
gcc -pthread -o thread_test thread_test.c
If i start ./thread_test, all cores are at 100%.
A word to fork and pthread_create:
fork creates a new process (the current process image will be copied and executed in parallel), while pthread_create will create a new thread, sometimes called a lightweight process.
Both, processes and threads will run in 'parallel' to the parent process.
It depends, when to use a child process over a thread, e.g. a child is able to replace its process image (via exec family) and has its own address space, while threads are able to share the address space of the current parent process.
There are of course a lot more differences, for that i recommend to study the following pages:
man fork
man pthreads
I am aware of threads, but the seem to run in the same core, so i don't seem to gain any performance using them for compute-bound operations (They are very useful for socket based stuff tho!).
No, they don't. Except if you block and your threads don't block, you'll see alll of them running. Just try this (beware that this consumes all your cpu time) that starts 16 threads each counting in a busy loop for 60 s. You will see all of them running and makins your cores to fail to their knees (it runs only a minute this way, then everything ends):
#include <assert.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 16 /* had 16 cores, so I used this. Put here as many
* threads as cores you have. */
struct thread_data {
pthread_t thread_id; /* the thread id */
struct timespec end_time; /* time to get out of the tunnel */
int id; /* the array position of the thread */
unsigned long result; /* number of times looped */
};
void *thread_body(void *data)
{
struct thread_data *p = data;
p->result = 0UL;
clock_gettime(CLOCK_REALTIME, &p->end_time);
p->end_time.tv_sec += 60; /* 60 s. */
struct timespec now;
do {
/* just get the time */
clock_gettime(CLOCK_REALTIME, &now);
p->result++;
/* if you call printf() you will see them slowing, as there's a
* common buffer that forces all thread to serialize their outputs
*/
/* check if we are over */
} while ( now.tv_sec < p->end_time.tv_sec
|| now.tv_nsec < p->end_time.tv_nsec);
return p;
} /* thread_body */
int main()
{
struct thread_data thrd_info[N];
for (int i = 0; i < N; i++) {
struct thread_data *d = &thrd_info[i];
d->id = i;
d->result = 0;
printf("Starting thread %d\n", d->id);
int res = pthread_create(&d->thread_id,
NULL, thread_body, d);
if (res < 0) {
perror("pthread_create");
exit(EXIT_FAILURE);
}
printf("Thread %d started\n", d->id);
}
printf("All threads created, waiting for all to finish\n");
for (int i = 0; i < N; i++) {
struct thread_data *joined;
int res = pthread_join(thrd_info[i].thread_id,
(void **)&joined);
if (res < 0) {
perror("pthread_join");
exit(EXIT_FAILURE);
}
printf("PTHREAD %d ended, with value %lu\n",
joined->id, joined->result);
}
} /* main */
Linux and all multithread systems work the same, they create a new execution unit (if both don't share the virtual address space, they are both processes --not exactly so, but this explains the main difference between a process and a thread--) and the available processors are given to each thread as necessary. Threads are normally encapsulated inside processes (they share ---not in linux, if that has not changed recently--- the process id, and virtual memory) Processes run each in a separate virtual space, so they can only share things through the system resources (files, shared memory, communication sockets/pipes, etc.)
The problem with your test case (you don't show it so I have go guess) is that probably you will make all threads in a loop in which you try to print something. If you do that, probably the most time each thread is blocked trying to do I/O (to printf() something)
Stdio FILEs have the problem that they share a buffer between all threads that want to print on the same FILE, and the kernel serializes all the write(2) system calls to the same file descriptor, so if the most of the time you pass in the loop is blocked in a write, the kernel (and stdio) will end serializing all the calls to print, making it to appear that only one thread is working at a time (all the threads will become blocked by the one that is doing the I/O) This busy loop will make all the threads to run in parallel and will show you how the cpu is collapsed.
Parallelism in C can be achieved by using the fork() function. This function simulates a thread by allowing two threads to run simultaneously and share data. The first thread forks itself, and the second thread is then executed as if it was launched from main(). Forking allows multiple processes to be Run concurrently without conflicts arising.
To make sure that data is shared appropriately between the two threads, use the wait() function before accessing shared resources. Wait will block execution of the current program until all database connections are closed or all I/O has been completed, whichever comes first.
I have a program in which multiple threads are in a loop where they acquire a binary semaphore and then increase a global counter. However, by printing out the thread IDs, I notice that only one thread ever acquires the semaphore. Here's my MRE:
#include <stdbool.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <semaphore.h>
#define NUM_THREADS 10
#define MAX_COUNTER 100
struct threadCtx {
sem_t sem;
unsigned int counter;
};
static void *
threadFunc(void *args)
{
struct threadCtx *ctx = args;
pthread_t self;
bool done = false;
self = pthread_self();
while (!done) {
sem_wait(&ctx->sem);
if ( ctx->counter == MAX_COUNTER ) {
done = true;
}
else {
sleep(1);
printf("Thread %u increasing the counter to %u\n", (unsigned int)self, ++ctx->counter);
}
sem_post(&ctx->sem);
}
return NULL;
}
int main() {
pthread_t threads[NUM_THREADS];
struct threadCtx ctx = {.counter = 0};
sem_init(&sem.ctx, 0, 1);
for (int k=0; k<NUM_THREADS; k++) {
pthread_create(threads+k, NULL, threadFunc, &ctx);
}
for (int k=0; k<NUM_THREADS; k++) {
pthread_join(threads[k], NULL);
}
sem_destroy(&ctx.sem);
return 0;
}
The output is
Thread 1004766976 increasing the counter to 1
Thread 1004766976 increasing the counter to 2
Thread 1004766976 increasing the counter to 3
...
If I remove the call to sleep, the behavior is closer to what I would expect (i.e., the threads being woken up in a seemingly indeterminate manner). Why would this be?
David Schwartz's answer explains what is happening at a low level. That is to say, he's looking at it from the perspective of an OS developer or a hardware designer. Nothing wrong with that, but let's look at your program from the perspective of a Software Architect:
You've got multiple threads all executing the same loop. The loop locks the mutex,* it does some "work," and then it releases the mutex. OK, but what does it do next? Almost the very next thing that your loop does after releasing the mutex is it locks the mutex again. Your loop spends practically 100% of its time doing "work" with the mutex locked.
So, what's the point of running that same loop in multiple threads when there's never any opportunity for two or more threads to work at the same time?
If you want to use threads to do a parallel computation, you need to find/invent safe ways for the threads to do most of their work with the mutex unlocked. They should only lock a mutex for just long enough to post a result or, to take another assignment.
Sometimes that means writing code that is less efficient than single threaded code would be. But suppose that program (A) has a single thread that makes almost 100% use of a CPU, while program (B) uses eight CPUs but only uses them with 50% efficiency. Which program is going to win?
* I know, your example uses a sem_t (semaphore) object. But "semaphore" is what you are using. "Mutex" is the role in which you are using it.
Why would this be?
Context switches are expensive and your implementation is, wisely, minimizing them. Your threads are all fighting over the same resource, trying to schedule them closely will make performance much worse, probably for the entire system.
Since the thread that keeps getting the semaphore never uses up its timeslice, it will keep getting the resource. It is your responsibility to write code to do the work that you want done. It's the implementation's responsibility to execute your code as efficiently as it can, and that's what it's doing.
Most likely, what's going under the hood is this:
The thread that keeps getting the sempahore can always make forward progress except when it is sleeping. But when it is sleeping, no other thread that needs the sempahore can make forward progress.
The thread that keeps getting the semaphore never exhausts its timeslice because it sleeps before that happens.
So there is no reason for the implementation to ever block this thread other than when it is sleeping, meaning that no other thread can get the semaphore. If you don't want this thread to keep sleeping with the semaphore and blocking other threads, then write different code.
I was trying to implement a multi-threaded program using binary semaphore. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <semaphore.h>
int g = 0;
sem_t *semaphore;
void *myThreadFun(void *vargp)
{
int myid = (int)vargp;
static int s = 0;
sem_wait(semaphore);
++s; ++g;
printf("Thread ID: %d, Static: %d, Global: %d\n", myid, s, g);
fflush(stdout);
sem_post(semaphore);
pthread_exit(0);
}
int main()
{
int i;
pthread_t tid;
if ((semaphore = sem_open("/semaphore", O_CREAT, 0644, 3))==SEM_FAILED) {
printf("semaphore initialization failed\n");
}
for (i = 0; i < 3; i++) {
pthread_create(&tid, NULL, myThreadFun, (void *)i);
}
pthread_exit(NULL);
return 0;
}
Now, when I opened the sempahore, I made the count 3. I was expecting that this wouldnt work, and I would get race condition, because each thread is now capable of decrementing the count.
Is there something wrong with the implementation? Also, if I make the count 0 during sem_open, wouldnt that initiate a deadlock condition, because all the threads should be blocked on sem_wait.
Now, when I opened the sempahore, I made the count 3. I was expecting that this wouldnt work, and I would get race condition, because each thread is now capable of decrementing the count.
And how do you judge that there isn't any race? Observing output consistent with what you could rely upon in the absence of a data race in no way proves that there isn't a data race. It merely fails provide any evidence of one.
However, you seem to be suggesting that there would be a data race inherent in more than one thread concurrently performing a sem_wait() on a semaphore whose value is initially greater than 1 (otherwise which counter are you talking about?). But that's utter nonsense. You're talking about a semaphore. It's a synchronization object. Such objects and the functions that manipulate them are the basis for thread synchronization. They themselves are either completely thread safe or terminally buggy.
Now, you are correct that you open the semaphore with an initial count sufficient to avoid any of your threads blocking in sem_wait(), and that therefore they can all run concurrently in the whole body of myThreadFun(). You have not established, however, that they in fact do run concurrently. There are several reasons why they might not do. If they do run concurrently, then the incrementing of shared variables s and g is indeed of concern, but again, even if you see no signs of a data race, that doesn't mean there isn't one.
Everything else aside, the fact that your threads all call sem_wait(), sem_post(), and printf() induces some synchronization in the form of memory barriers, which would reduce the likelihood of observing anomalous effects on s and g. sem_wait() and sem_post() must contain memory barriers in order to function correctly, regardless of the semaphore's current count. printf() calls are required to use locking to protect the state of the stream from corruption in multi-threaded programs, and it is reasonable to suppose that this will require a memory barrier.
Is there something wrong with the implementation?
Yes. It is not properly synchronized. Initialize the semaphore with count 1 so that the modifications of s and g occur only while exactly one thread has the semaphore locked.
Also, if I make the count 0 during sem_open, wouldnt that initiate a deadlock condition, because all the threads should be blocked on sem_wait.
If the semaphore has count 0 before any of the additional threads are started, then yes. It is therefore inappropriate to open the semaphore with count 0, unless you also subsequently post to it before starting the threads. But you are using a named semaphore. These persist until removed, and you never remove it. The count you specify to sem_open() has no effect unless a new semaphore needs to be created; when an existing semaphore is opened, its count is unchanged.
Also, do have the main thread join all the others before it terminates. It's not inherently wrong not to do so, but in most cases it's required for the semantics you want.
I'll get to the code in a bit that proves you had a race condition. I'll add a couple of different ways to trigger it so you can see how this works. I'm doing this on Linux and passing -std=gnu99 as a param to gcc ie
gcc -Wall -pedantic -lpthread -std=gnu99 semaphore.c -o semtex
Note. In your original example (assuming Linux) one fatal mistake you had was not removing the semaphore. If you run the following command you might see some of these sitting around on your machine
ls -la /dev/shm/sem.*
You need to make sure that there are no old semaphore sitting on the filesystem before running your program or you end up picking up the last settings from the old semaphore. You need to use sem_unlink to clean up.
To run it please use the following.
rm /dev/shm/sem.semtex;./semtex
I'm deliberately making sure that the semaphore is not there before running because if you have a DEADLOCK it can be left around and that causes all sorts of problems when testing.
Now, when I opened the sempahore, I made the count 3. I was expecting
that this wouldn't work, and I would get race condition, because each
thread is now capable of decrementing the count.
You had a race condition but sometimes C is so damned fast things can appear to work because your program can get everything it needs done in the time the OS has allocated to the thread ie the OS didn't preempt it at an important point.
This is one of those cases, the race condition is there you just need to squint a little bit to see it. In the following code you can tweak some parameters to see the a deadlock, correct usage and Undefined Behavior.
#define INITIAL_SEMAPHORE_VALUE CORRECT
The value INITIAL_SEMAPHORE_VALUE can take three values...
#define DEADLOCK 0
#define CORRECT 1
#define INCORRECT 2
I hope they're self explanatory. You can also use two methods to cause the race condition to blow up the program.
#define METHOD sleep
Set the METHOD to spin and you can play with the SPIN_COUNT and find out how many times the loop can run before you actually see a problem, this is C, it can get a lot done before it gets preempted. The code has most of the rest of the information you need in it.
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
#define DEADLOCK 0 // DEADLOCK
#define CORRECT 1 // CORRECT
#define INCORRECT 2 // INCORRECT
/*
* Change the following values to observe what happen.
*/
#define INITIAL_SEMAPHORE_VALUE CORRECT
//The next value provides to two different ways to trigger the problem, one
//using a tight loop and the other using a system call.
#define METHOD sleep
#if (METHOD == spin)
/* You need to increase the SPIN_COUNT to a value that's big enough that the
* kernel preempts the thread to see it fail. The value set here worked for me
* in a VM but might not work for you, tweak it. */
#define SPIN_COUNT 1000000
#else
/* The reason we can use such a small time for USLEEP is because we're making
* the kernel preempt the thread by using a system call.*/
#define USLEEP_TIME 1
#endif
#define TOT_THREADS 10
static int g = 0;
static int ret = 1729;
sem_t *semaphore;
void *myThreadFun(void *vargp) {
int myid = (int)vargp;
int w = 0;
static int s = 0;
if((w = sem_wait(semaphore)) != 0) {
fprintf(stderr, "Error: %s\n", strerror(errno));
abort();
};
/* This is the interesting part... Between updating `s` and `g` we add
* a delay using one of two methods. */
s++;
#if ( METHOD == spin )
int spin = 0;
while(spin < SPIN_COUNT) {
spin++;
}
#else
usleep(USLEEP_TIME);
#endif
g++;
if(s != g) {
fprintf(stderr, "Fatal Error: s != g in thread: %d, s: %d, g: %d\n", myid, s, g);
abort();
}
printf("Thread ID: %d, Static: %d, Global: %d\n", myid, s, g);
// It's a false sense of security if you think the assert will fail on a race
// condition when you get the params to sem_open wrong It might not be
// detected.
assert(s == g);
if((w = sem_post(semaphore)) != 0) {
fprintf(stderr, "Error: %s\n", strerror(errno));
abort();
};
return &ret;
}
int main(void){
int i;
void *status;
const char *semaphore_name = "semtex";
pthread_t tids[TOT_THREADS];
if((semaphore = sem_open(semaphore_name, O_CREAT, 0644, INITIAL_SEMAPHORE_VALUE)) == SEM_FAILED) {
fprintf(stderr, "Fatal Error: %s\n", strerror(errno));
abort();
}
for (i = 0; i < TOT_THREADS; i++) {
pthread_create(&tids[i], NULL, myThreadFun, (void *) (intptr_t) i);
}
for (i = 0; i < TOT_THREADS; i++) {
pthread_join(tids[i], &status);
assert(*(int*)status == 1729);
}
/*The following line was missing from your original code*/
sem_unlink(semaphore_name);
pthread_exit(0);
}
This question requires that two semaphores be used, one as a mutex, and one as a counting semaphore, and that the pair be used to simulate the interaction between students and a teacher's assistant.
I have been able to utilize the binary semaphore easily enough, however I cannot seem to find many examples that show the use of a counting semaphore, so I am quite sure I have it wrong, which is causing my code to not execute properly.
My code is below
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <semaphore.h>
#include <time.h>
#include <sys/types.h>
void *taThread();
void *student();
sem_t taMutex;
sem_t semaphore;
int main()
{
pthread_t tid1;
srand(time(NULL));
sem_init(&taMutex,0,1);
sem_init(&semaphore,1,3);
pthread_create(&tid1, NULL, &taThread, NULL);
pthread_join(tid1, NULL);
return 0;
}
void *taThread()
{
pthread_t tid2[10];
int it = 0;
printf("Teacher's Assistant taking a nap.\n");
for (it = 0; it < 10; it ++)
{
pthread_create(&tid2[it], NULL, &student, NULL);
}
for (it = 0; it < 10; it ++)
{
pthread_join(tid2[it], NULL);
}
}
void *student()
{
int xTime;
xTime = rand() % 10 + 1;
if (sem_wait(&taMutex) == 0)
{
printf("Student has awakened TA and is getting help. This will take %d minutes.\n", xTime);
sleep(xTime);
sem_post(&taMutex);
}
else if (sem_wait(&semaphore) > 2 )
{
printf("Student will return at another time.\n");
}
else
{
sem_wait(&semaphore);
printf("Student is working on their assignment until TA becomes available.\n");
sem_wait(&taMutex);
sem_post(&semaphore);
printf("Student is entering the TA's office. This will take %d minutes", xTime);
sleep(xTime);
sem_post(&taMutex);
}
}
My main question is: how can I get the threads to poll the counting semaphore concurrently?
I am trying to get a backup, with some students being forced to leave (or exit the thread) unhelped, with others waiting in the semaphore. Any assistance is appreciated, and any clarifications will be offered.
I'm not sure if your class / teacher wants to make special distinctions here, but fundamentally, a binary semaphore is mostly equivalent to a copunting semaphore initialized to 1,1 so that when you count it down ("P") to zero it becomes "busy" (locked like a mutex) and when you release it ("V") it counts up to its maximum of 1 and it's now "un-busy" (unlocked). A counting semaphore typically starts with a higher initial value, typically for counting some resource (like 3 available chairs in a room), so that when you count it down there may still be some left. When you're done using the counted resource (e.g., when the "student" leaves the "TA's office") you count it back up ("V").
With POSIX semaphores, the call:
sem_init(&semaphore,1,3);
says that this is a process-shared semaphore (2nd argument nonzero), rather than a thread-shared semaphore; you don't seem to need that, and I'm not sure if some systems might give you an error—a failing sem_init call, that is—if &semaphore is not in a process-shared region. You should be able to just use 0, 3. Otherwise, this is fine: it's saying, in effect, that there are three "unoccupied chairs" in the "office".
Other than that, you'll need to either use sem_trywait (as #pilcrow suggested), sem_timedwait, or a signal to interrupt the sem_wait call (e.g., SIGALRM), in order to have some student who tries to get a "seat" in the "office" to discover that he can't get one within some time-period. Just calling sem_wait means "wait until there's an unoccupied chair, even if that takes arbitrarily long". Only two things stop this potentially-infinite wait: either a chair becomes available, or a signal interrupts the call.
(The return value from the various sem_* functions tells you whether you "got" the "chair" you were waiting for. sem_wait waits "forever", sem_trywait waits not-at-all, and sem_timedwait waits until you "get the chair" or the clock runs out, whichever occurs first.)
1The real difference between a true binary semaphore and a counting semaphore is that a binary semaphore doesn't offer you the ability to count. It's either acquired (and an acquire will block), or not-acquired (and an acquire will succeed and block other acquires). Various implementations may consider releasing an already-released binary semaphore a no-op or an error (e.g., runtime panic). POSIX just doesn't offer binary semaphores at all: sem_init initializes a counting semaphore and it's your responsibility to set it to 1, and not over-increment it by releasing when it's already released. See also comments below.
I'm debugging a multi-threaded problem with C, pthread and Linux. On my MacOS 10.5.8, C2D, is runs fine, on my Linux computers (2-4 cores) it produces undesired outputs.
I'm not experienced, therefore I attached my code. It's rather simple: each new thread creates two more threads until a maximum is reached. So no big deal... as I thought until a couple of days ago.
Can I force single-core execution to prevent my bugs from occuring?
I profiled the programm execution, instrumenting with Valgrind:
valgrind --tool=drd --read-var-info=yes --trace-mutex=no ./threads
I get a couple of conflicts in the BSS segment - which are caused by my global structs and thread counter variales. However I could mitigate these conflicts with forced signle-core execution because I think the concurrent sheduling of my 2-4 core test-systems are responsible for my errors.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THR 12
#define NEW_THR 2
int wait_time = 0; // log global wait time
int num_threads = 0; // how many threads there are
pthread_t threads[MAX_THR]; // global array to collect threads
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; // sync
struct thread_data
{
int nr; // nr of thread, serves as id
int time; // wait time from rand()
};
struct thread_data thread_data_array[MAX_THR+1];
void
*PrintHello(void *threadarg)
{
if(num_threads < MAX_THR){
// using the argument
pthread_mutex_lock(&mut);
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
// updates
my_data->nr = num_threads;
my_data->time= rand() % 10 + 1;
printf("Hello World! It's me, thread #%d and sleep time is %d!\n",
my_data->nr,
my_data->time);
pthread_mutex_unlock(&mut);
// counter
long t = 0;
for(t = 0; t < NEW_THR; t++){
pthread_mutex_lock(&mut);
num_threads++;
wait_time += my_data->time;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
sleep(1);
}
printf("Bye from %d thread\n", my_data->nr);
pthread_exit(NULL);
}
return 0;
}
int
main (int argc, char *argv[])
{
long t = 0;
// srand(time(NULL));
if(num_threads < MAX_THR){
for(t = 0; t < NEW_THR; t++){
// -> 2 threads entry point
pthread_mutex_lock(&mut);
// rand time
thread_data_array[num_threads].time = rand() % 10 + 1;
// update global wait time variable
wait_time += thread_data_array[num_threads].time;
num_threads++;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
pthread_mutex_lock(&mut);
printf("In main: creating initial thread #%ld\n", t);
pthread_mutex_unlock(&mut);
}
}
for(t = 0; t < MAX_THR; t++){
pthread_join(threads[t], NULL);
}
printf("Bye from program, wait was %d\n", wait_time);
pthread_exit(NULL);
}
I hope that code isn't too bad. I didn't do too much C for a rather long time. :) The problem is:
printf("Bye from %d thread\n", my_data->nr);
my_data->nr sometimes resolves high integer values:
In main: creating initial thread #0
Hello World! It's me, thread #2 and sleep time is 8!
In main: creating initial thread #1
[...]
Hello World! It's me, thread #11 and sleep time is 8!
Bye from 9 thread
Bye from 5 thread
Bye from -1376900240 thread
[...]
I don't now more ways to profile and debug this.
If I debug this, it works - sometimes. Sometimes it doesn't :(
Thanks for reading this long question. :) I hope I didn't share too much of my currently unresolveable confusion.
Since this program seems to be just an exercise in using threads, with no actual goal, it is difficult to suggest how treat your problem rather than treat the symptom. I believe can actually pin a process or thread to a processor in Linux, but doing so for all threads removes most of the benefit of using threads, and I don't actually remember how to do it. Instead I'm going to talk about some things wrong with your program.
C compilers often make a lot of assumptions when they are doing optimizations. One of the assumptions is that unless the current code being examined looks like it might change some variable that variable does not change (this is a very rough approximation to this, and a more accurate explanation would take a very long time).
In this program you have variables which are shared and changed by different threads. If a variable is only read by threads (either const or effectively const after threads that look at it are created) then you don't have much to worry about (and in "read by threads" I'm including the main original thread) because since the variable doesn't change if the compiler only generates code to read that variable once (remembering it in a local temporary variable) or if it generates code to read it over and over the value is always the same so that calculations based on it always come out the same.
To force the compiler not do this you can use the volatile keyword. It is affixed to variable declarations just like the const keyword, and tells the compiler that the value of that variable can change at any instant, so reread it every time its value is needed, and rewrite it every time a new value for it is assigned.
NOTE that for pthread_mutex_t (and similar) variables you do not need volatile. It if were needed on the type(s) that make up pthread_mutex_t on your system volatile would have been used within the definition of pthread_mutex_t. Additionally the functions that access this type take the address of it and are specially written to do the right thing.
I'm sure now you are thinking that you know how to fix your program, but it is not that simple. You are doing math on a shared variable. Doing math on a variable using code like:
x = x + 1;
requires that you know the old value to generate the new value. If x is global then you have to conceptually load x into a register, add 1 to that register, and then store that value back into x. On a RISC processor you actually have to do all 3 of those instructions, and being 3 instructions I'm sure you can see how another thread accessing the same variable at nearly the same time could end up storing a new value in x just after we have read our value -- making our value old, so our calculation and the value we store will be wrong.
If you know any x86 assembly then you probably know that it has instructions that can do math on values in RAM (both getting from and storing the result in the same location in RAM all in one instruction). You might think that this instruction could be used for this operation on x86 systems, and you would almost be right. The problem is that this instruction is still executed in the steps that the RISC instruction would be executed in, and there are several opportunities for another processor to change this variable at the same time as we are doing our math on it. To get around this on x86 there is a lock prefix that may be applied to some x86 instructions, and I believe that glibc header files include atomic macro functions to do this on architectures that can support it, but this can't be done on all architectures.
To work right on all architectures you are going to need to:
int local_thread_count;
int create_a_thread;
pthread_mutex_lock(&count_lock);
local_thread_count = num_threads;
if (local_thread_count < MAX_THR) {
num_threads = local_thread_count + 1;
pthread_mutex_unlock(&count_lock);
thread_data_array[local_thread_count].nr = local_thread_count;
/* moved this into the creator
* since getting it in the
* child will likely get the
* wrong value. */
pthread_create(&threads[local_thread_count], NULL, PrintHello,
&thread_data_array[local_thread_count]);
} else {
pthread_mutex_unlock(&count_lock);
}
Now, since you would have changed the num_threads to volatile you can atomically test and increment the thread count in all threads. At the end of this local_thread_count should be usable as an index into the array of threads. Note that I did not create but 1 thread in this code, while yours was supposed to create several. I did this to make the example more clear, but it should not be too difficult to change it to go ahead and add NEW_THR to num_threads, but if NEW_THR is 2 and MAX_THR - num_threads is 1 (somehow) then you have to handle that correctly somehow.
Now, all of that being said, there may be another way to accomplish similar things by using semaphores. Semaphores are like mutexes, but they have a count associated with them. You would not get a value to use as the index into the array of threads (the function to read a semaphore count won't really give you this), but I thought that it deserved to be mentioned since it is very similar.
man 3 semaphore.h
will tell you a little bit about it.
num_threads should at least be marked volatile, and preferably marked atomic too (although I believe that the int is practically fine), so that at least there is a higher chance that the different threads are seeing the same values. You might want to view the assembler output to see when the writes of num_thread to memory are actually supposedly taking place.
https://computing.llnl.gov/tutorials/pthreads/#PassingArguments
that seems to be the problem. you need to malloc the thread_data struct.