I'm having trouble with getting my worker threads and facilitator threads to synchronize properly. The problem I'm trying to solve is to find the largest prime number 10 files using up to 10 threads. 1 thread is single-threaded and anything greater than that is multi-threaded.
The problem lies where the worker signals the facilitator that it has found a new prime. The facilitator can ignore it if the number is insignificant, or signal to update all threads my_latest_lgprime if it is important. I keep getting stuck in my brain and in code.
The task must be completed using a facilitator and synchronization.
Here is what I have so far:
Worker:
void* worker(void* args){
w_pack* package = (w_pack*) args;
int i, num;
char text_num[30];
*(package->fac_prime) = 0;
for(i = 0; i<package->file_count; i++){
int count = 1000000; //integers per file
FILE* f = package->assigned_files[i];
while(count != 0){
fscanf(f, "%s", text_num);
num = atoi(text_num);
pthread_mutex_lock(&lock2);
while(update_ready != 0){
pthread_cond_wait(&waiter, &lock2);
package->my_latest_lgprime = largest_prime;//largest_prime is global
update_ready = 0;
}
pthread_mutex_unlock(&lock2);
if(num > (package->my_latest_lgprime+100)){
if(isPrime(num)==1){
*(package->fac_prime) = num;
package->my_latest_lgprime = num;
pthread_mutex_lock(&lock);
update_check = 1;
pthread_mutex_unlock(&lock);
pthread_cond_signal(&updater);
}
}
count--;
}
}
done++;
return (void*)package;
}`
Facilitator:
void* facilitator(void* args){
int i, temp_large;
f_pack* package = (f_pack*) args;
while(done != package->threads){
pthread_mutex_lock(&lock);
while(update_check == 0)
pthread_cond_wait(&updater, &lock);
temp_large = isLargest(package->threads_largest, package->threads);
if(temp_large > largest_prime){
pthread_mutex_lock(&lock2);
update_ready = 1;
largest_prime = temp_large;
pthread_mutex_unlock(&lock2);
pthread_cond_broadcast(&waiter);
printf("New large prime: %d\n", largest_prime);
}
update_check = 0;
pthread_mutex_unlock(&lock);
}
}
Here is the worker package
typedef struct worker_package{
int my_latest_lgprime;
int file_count;
int* fac_prime;
FILE* assigned_files[5];
} w_pack;
Is there an easier way to do this using semaphores?
I can't really spot a problem with certainty, but just by briefly reading the code it seems the done variable is shared across threads yet it is accessed and modified without synchronization.
In any case, I can suggest a couple of ideas to improve on your solution.
You assign the list of files to each thread at start up. This isn't the most efficient way, since processing each file may take more or less time. It seems to me a better approach would be to have a single list of files, and then each thread picks up the next file in the list.
Do you really need a facilitator task for this? It seems to me each thread can keep track of its own largest prime, and every time it finds a new maximum it can go check a global maximum and update it if necessary. You could also keep a single maximum (w/o a per-thread maximum) but that will require you to lock every time you need to compare.
Here is pseudo-code of how I would write the worker threads:
while (true) {
lock(file_list_mutex)
if file list is empty {
break // we are done!
}
file = get_next_file_in_list
unlock(file_list_mutex)
max = 0
foreach number in file {
if number is prime and number > max {
lock(max_number_mutex)
if (number > max_global_number) {
max_global_number = number
}
max = max_global_number
unlock(max_number_mutex)
}
}
}
Before you start the worker threads you need to initialize max_global_number = 0.
The above solution has the benefit that it doesn't abuse locks like in your case, so thread contention is minimized.
Related
I'm working on a college assignment where we are to implement parallelized A* search for a 15 puzzle. For this part, we are to use only one priority queue (I suppose to see that the contention by multiple threads would limit speedup). A problem I am facing is properly synchronizing popping the next "candidate" from the priority queue.
I tried the following:
while(1) {
// The board I'm trying to pop.
Board current_board;
pthread_mutex_lock(&priority_queue_lock);
// If the heap is empty, wait till another thread adds new candidates.
if (pq->heap_size == 0)
{
printf("Waiting...\n");
pthread_mutex_unlock(&priority_queue_lock);
continue;
}
current_board = top(pq);
pthread_mutex_unlock(&priority_queue_lock);
// Generate the new boards from the current one and add to the heap...
}
I've tried different variants of the same idea, but for some reason there are occasions where the threads get stuck on "Waiting". The code works fine serially (or with two threads), so that leads me to believe this is the offending part of the code. I can post the entire thing if necessary. I feel like it's an issue with my understanding of the mutex lock though. Thanks in advance for help.
Edit:
I've added the full code for the parallel thread below:
// h and p are global pointers initialized in main()
void* parallelThread(void* arg)
{
int thread_id = (int)(long long)(arg);
while(1)
{
Board current_board;
pthread_mutex_lock(&priority_queue_lock);
current_board = top(p);
pthread_mutex_unlock(&priority_queue_lock);
// Move blank up.
if (current_board.blank_x > 0)
{
int newpos = current_board.blank_x - 1;
Board new_board = current_board;
new_board.board[current_board.blank_x][current_board.blank_y] = new_board.board[newpos][current_board.blank_y];
new_board.board[newpos][current_board.blank_y] = BLANK;
new_board.blank_x = newpos;
new_board.goodness = get_goodness(new_board.board);
new_board.turncount++;
if (check_solved(new_board))
{
printf("Solved in %d turns",new_board.turncount);
exit(0);
}
if (!exists(h,new_board))
{
insert(h,new_board);
push(p,new_board);
}
}
// Move blank down.
if (current_board.blank_x < 3)
{
int newpos = current_board.blank_x + 1;
Board new_board = current_board;
new_board.board[current_board.blank_x][current_board.blank_y] = new_board.board[newpos][current_board.blank_y];
new_board.board[newpos][current_board.blank_y] = BLANK;
new_board.blank_x = newpos;
new_board.goodness = get_goodness(new_board.board);
new_board.turncount++;
if (check_solved(new_board))
{
printf("Solved in %d turns",new_board.turncount);
exit(0);
}
if (!exists(h,new_board))
{
insert(h,new_board);
push(p,new_board);
}
}
// Move blank right.
if (current_board.blank_y < 3)
{
int newpos = current_board.blank_y + 1;
Board new_board = current_board;
new_board.board[current_board.blank_x][current_board.blank_y] = new_board.board[current_board.blank_x][newpos];
new_board.board[current_board.blank_x][newpos] = BLANK;
new_board.blank_y = newpos;
new_board.goodness = get_goodness(new_board.board);
new_board.turncount++;
if (check_solved(new_board))
{
printf("Solved in %d turns",new_board.turncount);
exit(0);
}
if (!exists(h,new_board))
{
insert(h,new_board);
push(p,new_board);
}
}
// Move blank left.
if (current_board.blank_y > 0)
{
int newpos = current_board.blank_y - 1;
Board new_board = current_board;
new_board.board[current_board.blank_x][current_board.blank_y] = new_board.board[current_board.blank_x][newpos];
new_board.board[current_board.blank_x][newpos] = BLANK;
new_board.blank_y = newpos;
new_board.goodness = get_goodness(new_board.board);
new_board.turncount++;
if (check_solved(new_board))
{
printf("Solved in %d turns",new_board.turncount);
exit(0);
}
if (!exists(h,new_board))
{
insert(h,new_board);
push(p,new_board);
}
}
}
return NULL;
}
I tried the following:
I don't see anything wrong with the code that follows, assuming that top also removes the board from the queue. It's wasteful (if the queue is empty, it will spin locking and unlocking the mutex), but not wrong.
I've added the full code
This is useless without the code for exists, insert and push.
One general observation:
pthread_mutex_lock(&priority_queue_lock);
current_board = top(p);
pthread_mutex_unlock(&priority_queue_lock);
In the code above, your locking is "ouside" of the top function. But here:
if (!exists(h,new_board))
{
insert(h,new_board);
push(p,new_board);
}
you either do no locking at all (in which case that's a bug), or you do locking "inside" exists, insert and push.
You should not mix "inside" and "outside" locking. Pick one or the other and stick with it.
If you in fact do not lock the queue inside exists, insert, etc. then you have a data race and are thinking of mutexes incorrectly: they protect invariants, and you can't check whether the queue is empty in parallel with another thread executing "remove top element" -- these operations require serialization, and thus must both be done under a lock.
I only just started writing multithreading in C and don't have a full understanding of how to implement it. I'm writing a code that reads an input file and puts into a buffer struct array. When the buffer has no more available space, request_t is blocked waiting for available space. It is controlled by thread Lift_R. The other threads lift 1-3 operate lift() and it writes whats in buffer to the output file depending number of int sec. sec and size and given values through command line. This will free up space for request to continue reading the input.
Can someone please help me with how to implement these functions properly. I know there are other questions relating to this, but I want my code to meet specific conditions.
(NOTE: lift operates in FIFO and threads use mutual exclusion)
This is what I wrote so far, I haven't implemented any waiting conditions or FIFO yet, I'm currently focusing on the writing to file and debugging and am soon getting to wait and signal.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include "list.h"
pthread_cond_t cond1 = PTHREAD_COND_INITIALIZER; //declare thread conditions
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; //declare mutex
int sec; //time required by each lift to serve a request
int size; //buffer size
buffer_t A[];
write_t write;
void *lift(void *vargp)
{
pthread_mutex_lock(&lock);
FILE* out;
out = fopen("sim_output.txt", "w");
//gather information to print
if (write.p == NULL) //only for when system begins
{
write.p = A[1].from;
}
write.rf = A[1].from;
write.rt = A[1].to;
write.m = (write.p - A[1].from) + (A[1].to - A[1].from);
if (write.total_r == NULL) //for when the system first begins
{
write.total_r = 0;
}
else
{
write.total_r++;
}
if (write.total_m == NULL)
{
write.total_m = write.m;
}
else
{
write.total_m = write.total_m + write.m;
}
write.c = A[1].to;
//Now write the information
fprintf(out, "Previous position: Floor %d\n", write.p);
fprintf(out, "Request: Floor %d to Floor %d\n", write.rf, write.rt);
fprintf(out, "Detail operations:\n");
fprintf(out, " Go from Floor %d to Floor %d\n", write.p, write.rf);
fprintf(out, " Go from Floor %d to Floor %d\n", write.rf, write.rt);
fprintf(out, " #movement for this request: %d\n", write.m);
fprintf(out, " #request: %d\n", write.total_r);
fprintf(out, " Total #movement: %d\n", write.total_m);
fprintf(out, "Current Position: Floor %d\n", write.c);
write.p = write.c; //for next statement
pthread_mutex_unlock(&lock);
return NULL;
}
void *request_t(void *vargp)
{
pthread_mutex_lock(&lock); //Now only request can operate
FILE* f;
FILE* f2;
f = fopen("sim_input.txt", "r");
if (f == NULL)
{
printf("input file empty\n");
exit(EXIT_FAILURE);
}
f2 = fopen("sim_output.txt", "w");
int i = 0;
for (i; i < size; i++)
{
//read the input line by line and into the buffer
fscanf(f, "%d %d", &A[i].from, &A[i].to);\
//Print buffer information to sim_output
fprintf(f2, "----------------------------\n");
fprintf(f2, "New Lift Request from Floor %d to Floor %d \n", A[i].from, A[i].to);
fprintf(f2, "Request No %d \n", i);
fprintf(f2, "----------------------------\n");
}
printf("Buffer is full");
fclose(f);
fclose(f2);
pthread_mutex_unlock(&lock);
return NULL;
}
void main(int argc, char *argv[]) // to avoid segmentation fault
{
size = atoi(argv[0]);
if (!(size >= 1))
{
printf("buffer size too small\n");
exit(0);
}
else
{
A[size].from = NULL;
A[size].to = NULL;
}
sec = atoi(argv[1]);
pthread_t Lift_R, lift_1, lift_2, lift_3;
pthread_create(&Lift_R, NULL, request_t, NULL);
pthread_join(Lift_R, NULL);
pthread_create(&lift_1, NULL, lift, NULL);
pthread_join(lift_1, NULL);
pthread_create(&lift_2, NULL, lift, NULL);
pthread_join(lift_2, NULL);
pthread_create(&lift_3, NULL, lift, NULL);
pthread_join(lift_3, NULL);
}
And here is the struct files:
#include <stdbool.h>
typedef struct Buffer
{
int from;
int to;
}buffer_t; //buffer arrary to store from and to values from sim_input
typedef struct Output
{
int l; //lift number
int p; //previous floor
int rf; //request from
int rt; //request to
int total_m; //total movement
int c; // current position
int m; //movement
int total_r; //total requests made
}write_t;
Between reading your code and question I see a large conceptual gap. There are some technical problems in the code (eg. you never fclose out); and a hard to follow sequence.
So, this pattern:
pthread_create(&x, ?, func, arg);
pthread_join(x, ...);
Can be replaced with:
func(arg);
so, your really aren't multithreaded at all; it is exactly as if:
void main(int argc, char *argv[]) // to avoid segmentation fault
{
size = atoi(argv[0]);
if (!(size >= 1))
{
printf("buffer size too small\n");
exit(0);
}
else
{
A[size].from = NULL;
A[size].to = NULL;
}
sec = atoi(argv[1]);
request_t(0);
lift(0);
lift(0);
lift(0);
}
and, knowing that, I hope you can see the futility in:
pthread_mutex_lock(&lock);
....
pthread_mutex_unlock(&lock);
So, start with a bit of a rethink of what you are doing. It sounds like you have a lift device which needs to take inbound requests, perhaps sort them, then process them. Likely 'forever'.
This probably means a sorted queue; however one not sorted by an ordinary criteria. The lift traverses the building in both directions, but means to minimize changes in direction. This involves traversing the queue with both an order ( >, < ) and a current-direction.
You would likely want request to simply evaluate the lift graph, and determine where to insert the new request.
The lift graph would be a unidirectional list of where the lift goes next. And, perhaps a rule that the list only consults its list as it stops at a given floor.
So, the Request can take a lock of the graph, alter it to reflect the new requestion, then unlock it.
The Lift can simply:
while (!Lift_is_decommissioned) {
pthread_mutex_lock(&glock);
Destination = RemoveHead(&graph);
pthread_mutex_unlock(&glock);
GoTo(Destination);
}
And the Request can be:
pthread_mutex_lock(&glock);
NewDestination = NewEvent.floorpressed;
NewDirection = NewEvent.floorpressed > NewEvent.curfloor ? Up : Down;
i = FindInsertion(&graph, NewDestination, NewDirection);
InsertAt(&graph, i, NewDestination);
pthread_mutex_unlock(&glock);
Which may be a bit surprising that there is no difference between pressing a "goto floor" button from within the lift, and a "I want lift here now" from outside the lift.
But, with this sort of separation, you can have the lift simply follow the recipe above, and the handlers for the buttons invoke the other pseudo code above.
The FindInsertion() may be a bit hairy though....
I'm trying to use local memory inside a device-side enqueued kernel.
My assumption that any locally-declared array is visible across all work items in the work group.
This is proven to be true when I use local memory on kernels that are called from the host-side, but I'm running into problems when I use a similar setup on device-side enqueued kernels.
Is there something wrong with my assumption?
Edit:
My kernel is below:
My goal is to sort the FIFO pipe into 3 buffers. The problem is that my work items have a limited view scope, and I'm trying to write the buffers into another pipe.
int pivot;
int in_pipe[BIN_SIZE];
int lt_bin[BIN_SIZE];
int gt_bin[BIN_SIZE];
int e_bin[BIN_SIZE];
reserve_id_t down_id = work_group_reserve_read_pipe(down_pipe, local_size);
//while ( is_valid_reserve_id(down_id) == false){
// down_id = work_group_reserve_read_pipe(down_pipe, local_size);
//}
//in_bin[tid] = -5;
if( is_valid_reserve_id(down_id) == true){
int status = read_pipe(down_pipe, down_id, lid, &pipe_out);
work_group_commit_read_pipe(down_pipe, down_id);
pivot = pipe_out;
pivot = work_group_broadcast(pivot, 0);
work_group_barrier(CLK_GLOBAL_MEM_FENCE);
work_group_barrier(CLK_LOCAL_MEM_FENCE);
in_pipe[tid] = pipe_out;
//in_bin[lid] = in_pipe[tid];
int e_count = 0;
int gt_count = 0;
int lt_count = 0;
if(in_pipe[tid] == pivot){
e_count = 1;
}
else if(in_pipe[tid] < pivot){
lt_count = 1;
}
else if(in_pipe[tid] > pivot){
gt_count = 1;
}
int e_tot = work_group_reduce_add(e_count);
e_tot = work_group_broadcast(e_tot, 0);
int e_val = work_group_scan_exclusive_add(e_count);
int gt_tot = work_group_reduce_add(gt_count);
gt_tot = work_group_broadcast(gt_tot, 0);
int gt_val = work_group_scan_exclusive_add(gt_count);
int lt_tot = work_group_reduce_add(lt_count);
lt_tot = work_group_broadcast(lt_tot, 0);
int lt_val = work_group_scan_exclusive_add(lt_count);
//in_bin[tid] = lt_val;
work_group_barrier(CLK_GLOBAL_MEM_FENCE);
work_group_barrier(CLK_LOCAL_MEM_FENCE);
if(in_pipe[tid] == pivot){
e_temp[e_val] = in_pipe[tid];
//in_bin[e_val] = e_bin[e_val];
//e_bin[e_Val] = work_group_broadcast(e_bin[e_val], lid);
}
if(in_pipe[tid] < pivot){
lte_temp[lt_val] = in_pipe[tid];
//in_bin[lt_val] = lt_bin[lt_val];
}
if(in_pipe[tid] > pivot){
gt_bin[gt_val] = in_pipe[tid];
//in_bin[gt_val] = gt_bin[gt_val];
}
No, not wrong. Local variables are declared and used across whole work-groups device-side too. They won't be shared with the parent kernels, though.
What exactly are you doing?
The working solution to my question is:
Pipes cannot be created on the device side. What I tried to accomplish was to make a dynamic tree structure, involving branches. OpenCL pipes simply cannot do that, as pipes are still memory objects, created on the host-side. There is no current way in the specifications to create memory objects.
Pipes, however, can be used in a dynamically-recursive method, albeit the recursion cannot deviate, and must occur in a linear fashion. Please consult the sample code found in the AMD APP SDK sample code packs for more details. Specifically, please look at the Device Enqueue BFS example.
I am developing a userspace premptive thread library(fibre) that uses context switching as the base approach. For this I wrote a scheduler. However, its not performing as expected. Can I have any suggestions for this.
The structure of the thread_t used is :
typedef struct thread_t {
int thr_id;
int thr_usrpri;
int thr_cpupri;
int thr_totalcpu;
ucontext_t thr_context;
void * thr_stack;
int thr_stacksize;
struct thread_t *thr_next;
struct thread_t *thr_prev;
} thread_t;
The scheduling function is as follows:
void schedule(void)
{
thread_t *t1, *t2;
thread_t * newthr = NULL;
int newpri = 127;
struct itimerval tm;
ucontext_t dummy;
sigset_t sigt;
t1 = ready_q;
// Select the thread with higest priority
while (t1 != NULL)
{
if (newpri > t1->thr_usrpri + t1->thr_cpupri)
{
newpri = t1->thr_usrpri + t1->thr_cpupri;
newthr = t1;
}
t1 = t1->thr_next;
}
if (newthr == NULL)
{
if (current_thread == NULL)
{
// No more threads? (stop itimer)
tm.it_interval.tv_usec = 0;
tm.it_interval.tv_sec = 0;
tm.it_value.tv_usec = 0; // ZERO Disable
tm.it_value.tv_sec = 0;
setitimer(ITIMER_PROF, &tm, NULL);
}
return;
}
else
{
// TO DO :: Reenabling of signals must be done.
// Switch to new thread
if (current_thread != NULL)
{
t2 = current_thread;
current_thread = newthr;
timeq = 0;
sigemptyset(&sigt);
sigaddset(&sigt, SIGPROF);
sigprocmask(SIG_UNBLOCK, &sigt, NULL);
swapcontext(&(t2->thr_context), &(current_thread->thr_context));
}
else
{
// No current thread? might be terminated
current_thread = newthr;
timeq = 0;
sigemptyset(&sigt);
sigaddset(&sigt, SIGPROF);
sigprocmask(SIG_UNBLOCK, &sigt, NULL);
swapcontext(&(dummy), &(current_thread->thr_context));
}
}
}
It seems that the "ready_q" (head of the list of ready threads?) never changes, so the search of the higest priority thread always finds the first suitable element. If two threads have the same priority, only the first one has a chance to gain the CPU. There are many algorithms you can use, some are based on a dynamic change of the priority, other ones use a sort of rotation inside the ready queue. In your example you could remove the selected thread from its place in the ready queue and put in at the last place (it's a double linked list, so the operation is trivial and quite inexpensive).
Also, I'd suggest you to consider the performace issues due to the linear search in ready_q, since it may be a problem when the number of threads is big. In that case it may be helpful a more sophisticated structure, with different lists of threads for different levels of priority.
Bye!
I'm having a hard time figuring out how to manage deallocation of memory in multithreaded environments. Specifically what I'm having a hard time with is using a lock to protect a structure, but when it's time to free the structure, you have to unlock the lock to destroy the lock itself. Which will cause problems if a separate thread is waiting on that same lock that you need to destroy.
I'm trying to come up with a mechanism that has retain counts, and when the object's retain count is 0, it's all freed. I've been trying a number of different things but just can't get it right. As I've been doing this it seems like you can't put the locking mechanism inside of the structure that you need to be able to free and destroy, because that requires you unlock the the lock inside of it, which could allow another thread to proceed if it was blocked in a lock request for that same structure. Which would mean that something undefined is guaranteed to happen - the lock was destroyed, and deallocated so either you get memory access errors, or you lock on undefined behavior..
Would someone mind looking at my code? I was able to put together a sandboxed example that demonstrates what I'm trying without a bunch of files.
http://pastebin.com/SJC86GDp
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
struct xatom {
short rc;
pthread_rwlock_t * rwlck;
};
typedef struct xatom xatom;
struct container {
xatom * atom;
};
typedef struct container container;
#define nr 1
#define nw 2
pthread_t readers[nr];
pthread_t writers[nw];
container * c;
void retain(container * cont);
void release(container ** cont);
short retain_count(container * cont);
void * rth(void * arg) {
short rc;
while(1) {
if(c == NULL) break;
rc = retain_count(c);
}
printf("rth exit!\n");
return NULL;
}
void * wth(void * arg) {
while(1) {
if(c == NULL) break;
release((container **)&c);
}
printf("wth exit!\n");
return NULL;
}
short retain_count(container * cont) {
short rc = 1;
pthread_rwlock_rdlock(cont->atom->rwlck);
printf("got rdlock in retain_count\n");
rc = cont->atom->rc;
pthread_rwlock_unlock(cont->atom->rwlck);
return rc;
}
void retain(container * cont) {
pthread_rwlock_wrlock(cont->atom->rwlck);
printf("got retain write lock\n");
cont->atom->rc++;
pthread_rwlock_unlock(cont->atom->rwlck);
}
void release(container ** cont) {
if(!cont || !(*cont)) return;
container * tmp = *cont;
pthread_rwlock_t ** lock = (pthread_rwlock_t **)&(*cont)->atom->rwlck;
pthread_rwlock_wrlock(*lock);
printf("got release write lock\n");
if(!tmp) {
printf("return 2\n");
pthread_rwlock_unlock(*lock);
if(*lock) {
printf("destroying lock 1\n");
pthread_rwlock_destroy(*lock);
*lock = NULL;
}
return;
}
tmp->atom->rc--;
if(tmp->atom->rc == 0) {
printf("deallocating!\n");
*cont = NULL;
pthread_rwlock_unlock(*lock);
if(pthread_rwlock_trywrlock(*lock) == 0) {
printf("destroying lock 2\n");
pthread_rwlock_destroy(*lock);
*lock = NULL;
}
free(tmp->atom->rwlck);
free(tmp->atom);
free(tmp);
} else {
pthread_rwlock_unlock(*lock);
}
}
container * new_container() {
container * cont = malloc(sizeof(container));
cont->atom = malloc(sizeof(xatom));
cont->atom->rwlck = malloc(sizeof(pthread_rwlock_t));
pthread_rwlock_init(cont->atom->rwlck,NULL);
cont->atom->rc = 1;
return cont;
}
int main(int argc, char ** argv) {
c = new_container();
int i = 0;
int l = 4;
for(i=0;i<l;i++) retain(c);
for(i=0;i<nr;i++) pthread_create(&readers[i],NULL,&rth,NULL);
for(i=0;i<nw;i++) pthread_create(&writers[i],NULL,&wth,NULL);
sleep(2);
for(i=0;i<nr;i++) pthread_join(readers[i],NULL);
for(i=0;i<nw;i++) pthread_join(writers[i],NULL);
return 0;
}
Thanks for any help!
Yes, you can't put the key inside the safe. Your approach with refcount (create object when requested and doesn't exist, delete on last release) is correct. But the lock must exist at least a moment before object is created and after it is destroyed - that is, while it is used. You can't delete it from inside of itself.
OTOH, you don't need countless locks, like one for each object you create. One lock that excludes obtaining and releasing of all objects will not create much performance loss at all. So just create the lock on init and destroy on program end. Otaining/releasing an object should take short enough that lock on variable A blocking access to unrelated variable B should almost never happen. If it happens - you can still introduce one lock per all rarely obtained variables and one per each frequently obtained one.
Also, there seems to be no point for rwlock, plain mutex suffices, and the create/destroy operations MUST exclude each other, not just parallel instances of themselves - so use pthread_create_mutex() family instead.