Deallocating memory in multi-threaded environment - c

I'm having a hard time figuring out how to manage deallocation of memory in multithreaded environments. Specifically what I'm having a hard time with is using a lock to protect a structure, but when it's time to free the structure, you have to unlock the lock to destroy the lock itself. Which will cause problems if a separate thread is waiting on that same lock that you need to destroy.
I'm trying to come up with a mechanism that has retain counts, and when the object's retain count is 0, it's all freed. I've been trying a number of different things but just can't get it right. As I've been doing this it seems like you can't put the locking mechanism inside of the structure that you need to be able to free and destroy, because that requires you unlock the the lock inside of it, which could allow another thread to proceed if it was blocked in a lock request for that same structure. Which would mean that something undefined is guaranteed to happen - the lock was destroyed, and deallocated so either you get memory access errors, or you lock on undefined behavior..
Would someone mind looking at my code? I was able to put together a sandboxed example that demonstrates what I'm trying without a bunch of files.
http://pastebin.com/SJC86GDp
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
struct xatom {
short rc;
pthread_rwlock_t * rwlck;
};
typedef struct xatom xatom;
struct container {
xatom * atom;
};
typedef struct container container;
#define nr 1
#define nw 2
pthread_t readers[nr];
pthread_t writers[nw];
container * c;
void retain(container * cont);
void release(container ** cont);
short retain_count(container * cont);
void * rth(void * arg) {
short rc;
while(1) {
if(c == NULL) break;
rc = retain_count(c);
}
printf("rth exit!\n");
return NULL;
}
void * wth(void * arg) {
while(1) {
if(c == NULL) break;
release((container **)&c);
}
printf("wth exit!\n");
return NULL;
}
short retain_count(container * cont) {
short rc = 1;
pthread_rwlock_rdlock(cont->atom->rwlck);
printf("got rdlock in retain_count\n");
rc = cont->atom->rc;
pthread_rwlock_unlock(cont->atom->rwlck);
return rc;
}
void retain(container * cont) {
pthread_rwlock_wrlock(cont->atom->rwlck);
printf("got retain write lock\n");
cont->atom->rc++;
pthread_rwlock_unlock(cont->atom->rwlck);
}
void release(container ** cont) {
if(!cont || !(*cont)) return;
container * tmp = *cont;
pthread_rwlock_t ** lock = (pthread_rwlock_t **)&(*cont)->atom->rwlck;
pthread_rwlock_wrlock(*lock);
printf("got release write lock\n");
if(!tmp) {
printf("return 2\n");
pthread_rwlock_unlock(*lock);
if(*lock) {
printf("destroying lock 1\n");
pthread_rwlock_destroy(*lock);
*lock = NULL;
}
return;
}
tmp->atom->rc--;
if(tmp->atom->rc == 0) {
printf("deallocating!\n");
*cont = NULL;
pthread_rwlock_unlock(*lock);
if(pthread_rwlock_trywrlock(*lock) == 0) {
printf("destroying lock 2\n");
pthread_rwlock_destroy(*lock);
*lock = NULL;
}
free(tmp->atom->rwlck);
free(tmp->atom);
free(tmp);
} else {
pthread_rwlock_unlock(*lock);
}
}
container * new_container() {
container * cont = malloc(sizeof(container));
cont->atom = malloc(sizeof(xatom));
cont->atom->rwlck = malloc(sizeof(pthread_rwlock_t));
pthread_rwlock_init(cont->atom->rwlck,NULL);
cont->atom->rc = 1;
return cont;
}
int main(int argc, char ** argv) {
c = new_container();
int i = 0;
int l = 4;
for(i=0;i<l;i++) retain(c);
for(i=0;i<nr;i++) pthread_create(&readers[i],NULL,&rth,NULL);
for(i=0;i<nw;i++) pthread_create(&writers[i],NULL,&wth,NULL);
sleep(2);
for(i=0;i<nr;i++) pthread_join(readers[i],NULL);
for(i=0;i<nw;i++) pthread_join(writers[i],NULL);
return 0;
}
Thanks for any help!

Yes, you can't put the key inside the safe. Your approach with refcount (create object when requested and doesn't exist, delete on last release) is correct. But the lock must exist at least a moment before object is created and after it is destroyed - that is, while it is used. You can't delete it from inside of itself.
OTOH, you don't need countless locks, like one for each object you create. One lock that excludes obtaining and releasing of all objects will not create much performance loss at all. So just create the lock on init and destroy on program end. Otaining/releasing an object should take short enough that lock on variable A blocking access to unrelated variable B should almost never happen. If it happens - you can still introduce one lock per all rarely obtained variables and one per each frequently obtained one.
Also, there seems to be no point for rwlock, plain mutex suffices, and the create/destroy operations MUST exclude each other, not just parallel instances of themselves - so use pthread_create_mutex() family instead.

Related

Implementing locks

Some doubts when reading the operating system material of implementing locks
struct lock {
int locked;
struct queue q;
int sync; /* Normally 0. */
};
void lock_acquire(struct lock *l) {
intr_disable();
while (swap(&l->sync, 1) != 0) {
/* Do nothing */
}
if (!l->locked) {
l->locked = 1;
l->sync = 0;
} else {
queue_add(&l->q, thread_current());
thread_block(&l->sync);
}
intr_enable();
}
void lock_release(struct lock *l) {
intr_disable();
while (swap(&l->sync, 1) != 0) {
/* Do nothing */
}
if (queue_empty(&l->q) {
l->locked = 0;
} else {
thread_unblock(queue_remove(&l->q));
}
l->sync = 0;
intr_enable();
}
What is the purpose of sync?
My gut feeling is that the solutions are all broken. For a lock working correctly the lock_acquire needs to have acquire semantics and the lock_release needs to have release semantics. This way the loads/stores inside the critical section can't move outside of the critical section + you have a happens before edge between a lock release and a subsequent lock acquire on the same lock.
If you take a look at the spinning version:
struct lock {
int locked;
};
void lock_acquire(struct lock *l) {
while (swap(&l->locked, 1)) {
/* Do nothing */
}
}
void lock_release(struct lock *l) {
l->locked = 0;
}
The assignment of locked=0 is just an ordinary store. This means that it can be reordered with other loads and stores before it + it doesn't provide a happens before edge.
It seems to me that 'sync' is a way for the thread to let the OS know that the lock is in use since bot "lock" and "unlock" waits for 'sync" value to be changed before proceeding.
(It's a bit peculiar that interrupts are disabled before checking the 'sync' value)

Change value of a struct

I was learning how to use multithreading and I had a question with an exercise that I had come across.
How can I change the bool value of the structure to be true using the function? (I'm bad with pointers). The lock should be in the main function.
The purpose is to lock a thread and prevent others from executing once that state is reached.
pd: I use pthreads
typedef struct Data{
bool used;
}data;
void lock(data *info){
info -> used = true;
}
Use the & operator to get the address of an object. The address is the pointer to the object.
typedef struct Data{
bool used;
}data;
void lock(data *info){
info -> used = true;
}
int main(int argc, char *argv[])
{
data my_struct = {0};
lock(&my_struct);
if (my_struct.used == true)
printf("It is true!\n");
return 0;
}
My understanding of your situation is that you want use pthread locks in your lock function to guard the write operation (info->used = true).
You should create the pthread_mutex_t (Data structure for locking) before using the lock(data *) function. Following is an example.
#include <stdio.h>
#include <stdbool.h>
#include <pthread.h>
typedef struct data
{
bool used;
}data;
pthread_mutex_t spin_lock;
void* lock(void *xxinfo)
{
if (xxinfo != NULL)
{
data *info= (data *)xxinfo;
pthread_mutex_lock(&spin_lock);
info->used = true;
printf("Set the used status\n");
pthread_mutex_unlock(&spin_lock);
}
return NULL;
}
pthread_t threads[2]; // Used it for demonstrating only
int main()
{
int status = 0;
data some_data;
if(0 != pthread_mutex_init(&spin_lock, NULL))
{
printf("Error: Could not initialize the lock\n");
return -1;
}
status = pthread_create(&threads[0], NULL, &lock, &some_data);
if (status != 0)
{
printf("Error: Could not create 0th thread\n");
}
status = pthread_create(&threads[1], NULL, &lock, &some_data);
if (status != 0)
{
printf("Error: Could not create 1st thread\n");
}
pthread_join(threads[0], NULL);
pthread_join(threads[1], NULL);
pthread_mutex_destroy(&spin_lock);
return 0;
}
In this example I am using global spin_lock (which is not a great idea). In your code consider keeping it in an appropriate scope. I have created two threads here for demonstration. To my understanding they don't race at all. I hope this gives you an idea to use pthread locks in your case. You should use lock just for the part of the code that modifies or reads the data.
Note that you should create lock <pthread_mutex_init> before creating the threads. You can also send the locks as parameter to the thread.
Destroy the lock after using it.

Some threads never get execution when invoked in large amount

Consider the following program,
static long count = 0;
void thread()
{
printf("%d\n",++count);
}
int main()
{
pthread_t t;
sigset_t set;
int i,limit = 30000;
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NPROC, &rlim);
for(i=0; i<limit; i++) {
if(pthread_create(&t,NULL,(void *(*)(void*))thread, NULL) != 0) {
printf("thread creation failed\n");
return -1;
}
}
sigemptyset(&set);
sigsuspend(&set);
return 0;
}
This program is expected to print 1 to 30000. But it some times prints 29945, 29999, 29959, etc. Why this is happening?
Because count isn't atomic, so you have a race condition both in the increment and in the subsequent print.
The instruction you need is atomic_fetch_add, to increment the counter and avoid the race condition. The example on cppreference illustrates the exact problem you laid out.
Your example can be made to work with just a minor adjustment:
#include <stdio.h>
#include <signal.h>
#include <sys/resource.h>
#include <pthread.h>
#include <stdatomic.h>
static atomic_long count = 1;
void * thread(void *data)
{
printf("%ld\n", atomic_fetch_add(&count, 1));
return NULL;
}
int main()
{
pthread_t t;
sigset_t set;
int i,limit = 30000;
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NPROC, &rlim);
for(i=0; i<limit; i++) {
if(pthread_create(&t, NULL, thread, NULL) != 0) {
printf("thread creation failed\n");
return -1;
}
}
sigemptyset(&set);
sigsuspend(&set);
return 0;
}
I made a handful of other changes, such as fixing the thread function signature and using the correct printf format for printing longs. But the atomic issue is why you weren't printing all the numbers you expected.
Why this is happening?
Because you have a data race (undefined behavior).
In particular, this statement:
printf("%d\n",++count);
modifies a global (shared) variable without any locking. Since the ++ does not atomically increment it, it's quite possible for multiple threads to read the same value (say 1234), increment it, and store the updated value in parallel, resulting in 1235 being printed repeatedly (two or more times), and one or more of the increments being lost.
A typical solution is to either use mutex to avoid the data race, or (rarely) an atomic variable (which guarantees atomic increment). Beware: atomic variables are quite hard to get right. You are not ready to use them yet.

In the below code: get_row_of_machine receiving mon_param as a parameter thread-safe?

mon_param is allocated memory by the main process invoking the thread function.
This function will be invoked by multiple threads.So, can I safely assume that it is thread safe as I am using only the variables on the stack?
struct table* get_row_of_machine(int row_num,struct mon_agent *mon_param)
{
struct table *table_row = mon_param->s_table_rows;
if(row_num < mon_param->total_states)
{
table_row = table_row + row_num;
}
return table_row;
}
//in the main function code goes like this ....
int main()
{
int msg_type,ret;
while(!s_interrupted)
{
inter_thread_pair = zsock_new(ZMQ_PAIR);
if(inter_thread_pair != NULL)
zsock_bind (inter_thread_pair, "inproc://zmq_main_pair");
int ret_val = zmq_poll(&socket_items[0], 1, 0); // Do not POLL indefinitely.
if(socket_items[0].revents & ZMQ_POLLIN)
{
char *msg = zstr_recv (inter_thread_pair); //
if(msg != NULL)
{
struct mon_agent *mon_params;
//This is where mon_params is getting its memory
mon_params = (struct mon_agent*)malloc(sizeof(struct mon_agent));
msg_type = get_msg_type(msg);
if(msg_type == /*will check for some message type here*/)
{
struct thread_sock_params *thd_sock = create_connect_pair_socket(thread_count);
// copy the contents of thread_sock_params and also the mon_params to this struct
struct thread_parameters parameters;
parameters.sock_params = thd_sock;
parameters.params = mon_params; //mon_params getting copid here.
//Every time I receive a particular message, I create a new thread and pass on the parameters.
//So, each thread gets its own mon_params memory allocated.
ret = pthread_create(&thread,NULL,monitoring_thread,(void*)&parameters);
and then it goes on like this.
}
}
}
and the code continues..... there is a breakpoint somewhere down..
}
}
void* mon_thread(void *data)
{
// First time data is sent as a function parameter and later will be received as messages.
struct thread_parameters *th_param = (struct thread_parameters *)data;
struct mon_agent *mon_params = th_param->params;
zsock_t* thread_pair_client = zsock_new(ZMQ_PAIR);
//printf("Value of socket is %s: \n",th_param->socket_ep);
rc = zsock_connect(thread_pair_client,th_param->sock_params->socket_ep);
if(rc == -1)
{
printf("zmq_connect failed in monitoring thread.\n");
}
while(!s_interrupted)
{
int row;
//logic to maintain the curent row.
//also receive other messages from thread_pair_client czmq socket.
run_machine(row,mon_params);
}
}
void run_machine(int row_num, struct mon_agent *mon_params)
{
struct table* table_row = get_row_of_state_machine(row_num,mon_param);
}
In short, no.
The way to make parameters thread safe is by design.
There is no fool proof way to do this or a rule of thumb. If you know your codes design well enough and you know no other thread will access the same struct then it's possibly thread safe.
If you do know some other thread might try to access the struct you can use all sorts of synchronization primitives like mutexes, critical sections, semaphores or more generally locks.

Writing a scheduler for a Userspace thread library

I am developing a userspace premptive thread library(fibre) that uses context switching as the base approach. For this I wrote a scheduler. However, its not performing as expected. Can I have any suggestions for this.
The structure of the thread_t used is :
typedef struct thread_t {
int thr_id;
int thr_usrpri;
int thr_cpupri;
int thr_totalcpu;
ucontext_t thr_context;
void * thr_stack;
int thr_stacksize;
struct thread_t *thr_next;
struct thread_t *thr_prev;
} thread_t;
The scheduling function is as follows:
void schedule(void)
{
thread_t *t1, *t2;
thread_t * newthr = NULL;
int newpri = 127;
struct itimerval tm;
ucontext_t dummy;
sigset_t sigt;
t1 = ready_q;
// Select the thread with higest priority
while (t1 != NULL)
{
if (newpri > t1->thr_usrpri + t1->thr_cpupri)
{
newpri = t1->thr_usrpri + t1->thr_cpupri;
newthr = t1;
}
t1 = t1->thr_next;
}
if (newthr == NULL)
{
if (current_thread == NULL)
{
// No more threads? (stop itimer)
tm.it_interval.tv_usec = 0;
tm.it_interval.tv_sec = 0;
tm.it_value.tv_usec = 0; // ZERO Disable
tm.it_value.tv_sec = 0;
setitimer(ITIMER_PROF, &tm, NULL);
}
return;
}
else
{
// TO DO :: Reenabling of signals must be done.
// Switch to new thread
if (current_thread != NULL)
{
t2 = current_thread;
current_thread = newthr;
timeq = 0;
sigemptyset(&sigt);
sigaddset(&sigt, SIGPROF);
sigprocmask(SIG_UNBLOCK, &sigt, NULL);
swapcontext(&(t2->thr_context), &(current_thread->thr_context));
}
else
{
// No current thread? might be terminated
current_thread = newthr;
timeq = 0;
sigemptyset(&sigt);
sigaddset(&sigt, SIGPROF);
sigprocmask(SIG_UNBLOCK, &sigt, NULL);
swapcontext(&(dummy), &(current_thread->thr_context));
}
}
}
It seems that the "ready_q" (head of the list of ready threads?) never changes, so the search of the higest priority thread always finds the first suitable element. If two threads have the same priority, only the first one has a chance to gain the CPU. There are many algorithms you can use, some are based on a dynamic change of the priority, other ones use a sort of rotation inside the ready queue. In your example you could remove the selected thread from its place in the ready queue and put in at the last place (it's a double linked list, so the operation is trivial and quite inexpensive).
Also, I'd suggest you to consider the performace issues due to the linear search in ready_q, since it may be a problem when the number of threads is big. In that case it may be helpful a more sophisticated structure, with different lists of threads for different levels of priority.
Bye!

Resources