safe method to wait for all thread timer callbacks completion - c

In case of one-shot timer i can use semaphore to wait for timer callback completion.
But if timer was fired several times it doesn't help. Consider the following code:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
#define N 10
void timer_threaded_function(sigval_t si)
{
uint8_t *shared_resource = si.sival_ptr;
sleep(rand() % 7);
/* ... manipulate with shared_resource */
return;
}
int main()
{
struct sigevent sig_ev = {0};
uint8_t *shared_resource = malloc(123);
timer_t timer_id;
int i;
sig_ev.sigev_notify = SIGEV_THREAD;
sig_ev.sigev_value.sival_ptr = shared_resource;
sig_ev.sigev_notify_function = timer_threaded_function;
sig_ev.sigev_notify_attributes = NULL;
timer_create(CLOCK_REALTIME, &sig_ev, &timer_id);
for (i = 0; i < N; i++) {
/* arm timer for 1 nanosecond */
timer_settime(timer_id, 0,
&(struct itimerspec){{0,0},{0,1}}, NULL);
/* sleep a little bit, so timer will be fired */
usleep(1);
}
/* only disarms timer, but timer callbacks still can be running */
timer_delete(timer_id);
/*
* TODO: safe wait for all callbacks to end, so shared resource
* can be freed without races.
*/
...
free(shared_resource);
return 0;
}
timer_delete() only disarms timer (if it is was armed) and frees assocoated with timer resources. But timer callbacks still can be running. So we cannot free shared_resource, otherwise race condition may occur. Is there any method to cope with this situation?
I thougth about reference counting, but it doesn't help, because we doesn't know how much threads actually will try to access shared resource (cause of timer overruns).

It is thoroughly unsatisfactory :-(. I have looked, and there does not seem to be any way to discover whether the sigevent (a) has not been fired, or (b) is pending, or (c) is running, or (d) has completed.
Best I can suggest is an extra level of indirection, and a static to point at the shared resource. So:
static foo_t* p_shared ;
....
p_shared = shared_resourse ;
.....
sig_ev.sigev_value.sival_ptr = &p_shared ;
where foo_t is the type of the shared resource.
Now we can use some atomics... in timer_threaded_function():
foo_t** pp_shared ;
foo_t* p_locked ;
foo_t* p_shared ;
pp_shared = so.sival_ptr ;
p_locked = (void*)UINPTR_MAX ;
p_shared = atomic_swap(pp_shared, p_locked) ;
if (p_shared == p_locked)
return ; // locked already.
if (p_shared == NULL)
return ; // destroyed already.
.... proceed to do the usual work ...
if (atomic_cmp_swap(pp_shared, &p_locked, p_shared))
return ; // was locked and is now restored
assert(p_locked == NULL) ;
... the shared resource needs to be freed ...
And in the controlling thread:
timer_delete(timer_id) ; // no more events, thank you
p_s = atomic_swap(&p_shared, NULL) ; // stop processing events
if (p_s == (void*)UINTPTR_MAX)
// an event is being processed.
if (p_s != NULL)
... the shared resource needs to be freed ...
When the event thread finds that the shared resourse needs to be freed, it can do it itself, or signal to the controlling thread that the event has been processed, so that the controlling thread can go ahead and do the free. That's largely a matter of taste.
Basically, this is using atomics to provide a sort of a lock, whose value is tri-state: NULL <=> destroyed ; UINTPTR_MAX <=> locked ; anything else <=> unlocked.
The down-side is the static p_shared, which has to remain in existence until the timer_threaded_function() has finished and will never be called again... and since those are precisely the things that are unknowable, static p_shared is, effectively, a fixture :-(.

Related

multithreading with mutexes in c and running one thread at a time

I have an array of 100 requests(integers). I want to create 4 threads to which i call a function(thread_function) and with this function i want every thread to take one by one the requests:
(thread0->request0,
thread1->request1,
thread2->request2,
thread3->request3
and then thread0->request4 etc up to 100) all these by using mutexes.
Here is the code i have writen so far:
threadRes = pthread_create(&(threadID[i]), NULL,thread_function, (void *)id_size);
This is inside my main and it is in a loop for 4 times.Now outside my main:
void *thread_function(void *arg){
int *val_p=(int *) arg;
for(i=0; i<200; i=i+2)
{
f=false;
for (j= 0; j<100; j++)
{
if (val_p[i]==cache[j].id)
f=true;
}
if(f==true)
{
printf("The request %d has been served.\n",val_p[i]);
}
else
{
cache[k].id=val_p[i];
printf("\nCurrent request to be served:%d \n",cache[k].id);
k++;
}
}
Where: val_p is the array with the requests and cache is an array of structs to store the id(requests).
-So now i want mutexes to synchronize my threads. I considered using inside my main:
pthread_join(threadID[0], NULL);
pthread_join(threadID[1], NULL);
pthread_join(threadID[2], NULL);
pthread_join(threadID[3], NULL);
pthread_mutex_destroy(&mutex);
and inside the function to use:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
Before i finish i would like to say that so far my programm result is that 4threads run 100 requests each(400) and what i want to achieve is that 4threads run 100 threads total.
Thanks for your time.
You need to use a loop that looks like this:
Acquire lock.
See if there's any work to be done. If not, release the lock and terminate.
Mark the work that we're going to do as not needing to be done anymore.
Release the lock.
Do the work.
(If necessary) Acquire the lock. Mark the work done and/or report results. Release the lock.
Go to step 1.
Notice how while holding the lock, the thread discovers what work it should do and then prevents any other thread from taking the same assignment before it releases the lock. Note also that the lock is not held while doing the work so that multiple threads can work concurrently.
You may want to post more of your code. How the arrays are set up, how the segment is passed to the individual threads, etc.
Note that using printf will perturb the timing of the threads. It does its own mutex for access to stdout, so it's probably better to no-op this. Or, have a set of per-thread logfiles so the printf calls don't block against one another.
Also, in your thread loop, once you set f to true, you can issue a break as there's no need to scan further.
val_p[i] is loop invariant, so we can fetch that just once at the start of the i loop.
We don't see k and cache, but you'd need to mutex wrap the code that sets these values.
But, that does not protect against races in the for loop. You'd have to wrap the fetch of cache[j].id in a mutex pair inside the loop. You might be okay without the mutex inside the loop on some arches that have good cache snooping (e.g. x86).
You might be better off using stdatomic.h primitives. Here's a version that illustrates that. It compiles but I've not tested it:
#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
int k;
#define true 1
#define false 0
struct cache {
int id;
};
struct cache cache[100];
#ifdef DEBUG
#define dbgprt(_fmt...) \
printf(_fmt)
#else
#define dbgprt(_fmt...) \
do { } while (0)
#endif
void *
thread_function(void *arg)
{
int *val_p = arg;
int i;
int j;
int cval;
int *cptr;
for (i = 0; i < 200; i += 2) {
int pval = val_p[i];
int f = false;
// decide if request has already been served
for (j = 0; j < 100; j++) {
cptr = &cache[j].id;
cval = atomic_load(cptr);
if (cval == pval) {
f = true;
break;
}
}
if (f == true) {
dbgprt("The request %d has been served.\n",pval);
continue;
}
// increment the global k value [atomically]
int kold = atomic_load(&k);
int knew;
while (1) {
knew = kold + 1;
if (atomic_compare_exchange_strong(&k,&kold,knew))
break;
}
// get current cache value
cptr = &cache[kold].id;
int oldval = atomic_load(cptr);
// mark the cache
// this should never loop because we atomically got our slot with
// the k value
while (1) {
if (atomic_compare_exchange_strong(cptr,&oldval,pval))
break;
}
dbgprt("\nCurrent request to be served:%d\n",pval);
}
return (void *) 0;
}

Allocate the resource of some threads to a global resource such that there is no concurrency(using only mutexes)

The summary of the problem is the following: Given a global resource of size N and M threads with their resource size of Xi (i=1,M) , syncronize the threads such that a thread is allocated,it does its stuff and then it is deallocated.
The main problem is that there are no resources available and the thread has to wait until there is enough memory. I have tried to "block" it with a while statement, but I realized that two threads can pass the while loop and the first to be allocated can change the global resource such that the second thread does not have enough space, but it has already passed the conditional section.
//piece of pseudocode
...
int MAXRES = 100;
// in thread function
{
while (thread_res > MAXRES);
lock();
allocate_res(); // MAXRES-=thread_res;
unlock();
// do own stuff
lock()
deallocate(); // MAXRES +=thread_res;
unlock();
}
To make a robust solution you need something more. As you noted, you need a way to wait until the condition ' there are enough resources available for me ', and a proper mutex has no such mechanism. At any rate, you likely want to bury that in the allocator, so your mainline thread code looks like:
....
n = Allocate()
do stuff
Release(n)
....
and deal with the contention:
int navail = N;
int Acquire(int n) {
lock();
while (n < navail) {
unlock();
lock();
}
navail -= n;
unlock();
return n;
}
void Release(int n) {
lock();
navail += n;
unlock();
}
But such a spin waiting system may fail because of priorities -- if the highest priority thread is spinning, a thread trying to Release may not be able to run. Spinning isn't very elegant, and wastes energy. You could put a short nap in the spin, but if you make the nap too short it wastes energy, too long and it increases latency.
You really want a signalling primitive like semaphores or condvars rather than a locking primitive. With a semaphore, it could look impressively like:
Semaphore *sem;
int Acquire(int n) {
senter(sem, n);
return n;
}
int Release(int n) {
sleave(sem, n);
return n;
}
void init(int N) {
sem = screate(N);
}
Update: Revised to use System V semaphores, which provides the ability to specify arbitrary 'checkout' and 'checkin' to the semaphore value. Overall logic the same.
Disclaimer: I did not use System V for few years, test before using, in case I missed some details.
int semid ;
// Call from main
do_init() {
shmkey = ftok(...) ;
semid = shmget(shmkey, 1, ...) ;
// Setup 100 as max resources.
struct sembuf so ;
so.sem_num = 0 ;
so.sem_op = 100 ;
so.sem_flg = 0 ;
semop(semid, &so, 1) ;
}
// Thread work
do_thread_work() {
int N = <number of resources>
struct sembuf so ;
so.sem_num = 0;
so.sem_op = -N ;
so.sem_flg = 0 ;
semop(semid, &so, 1) ;
... Do thread work
// Return resources
so.sem_op = +N ;
semop(semid, &so, 1) ;
}
As per: https://pubs.opengroup.org/onlinepubs/009695399/functions/semop.html, this will result in atomic checkout of multiple items.
Note: the ... as sections related to IPC_NOWAIT and SEM_UNDO, not relevant to this case.
If sem_op is a negative integer and the calling process has alter permission, one of the following shall occur:
If semval(see ) is greater than or equal to the absolute value of sem_op, the absolute value of sem_op is subtracted from semval. ...
...
If semval is less than the absolute value of sem_op ..., semop() shall increment the semncnt associated with the specified semaphore and suspend execution of the calling thread until one of the following conditions occurs:
The value of semval becomes greater than or equal to the absolute value of sem_op. When this occurs, the value of semncnt associated with the specified semaphore shall be decremented, the absolute value of sem_op shall be subtracted from semval and, ... .

Need some help in C code for optimization (Poll + delay/sleep)

Currently I'm polling the register to get the expected value and now I want reduce the CPU usage and increase the performance.
So, I think, if we do polling for particular time (Say for 10ms) and if we didn't get expected value then wait for some time (like udelay(10*1000) or usleep(10*1000) delay/sleep in ms) then continue to do polling for more more extra time (Say 100ms) and still if you didn't get the expected value then do sleep/delay for 100ms.....vice versa... need to do till it reach to maximum timeout value.
Please let me know if anything.
This is the old code:
#include <sys/time.h> /* for setitimer */
#include <unistd.h> /* for pause */
#include <signal.h> /* for signal */
#define INTERVAL 500 //timeout in ms
static int timedout = 0;
struct itimerval it_val; /* for setting itimer */
char temp_reg[2];
int main(void)
{
/* Upon SIGALRM, call DoStuff().
* Set interval timer. We want frequency in ms,
* but the setitimer call needs seconds and useconds. */
if (signal(SIGALRM, (void (*)(int)) DoStuff) == SIG_ERR)
{
perror("Unable to catch SIGALRM");
exit(1);
}
it_val.it_value.tv_sec = INTERVAL/1000;
it_val.it_value.tv_usec = (INTERVAL*1000) % 1000000;
it_val.it_interval = it_val.it_value;
if (setitimer(ITIMER_REAL, &it_val, NULL) == -1)
{
perror("error calling setitimer()");
exit(1);
}
do
{
temp_reg[0] = read_reg();
//Read the register here and copy the value into char array (temp_reg
if (timedout == 1 )
return -1;//Timedout
} while (temp_reg[0] != 0 );//Check the value and if not try to read the register again (poll)
}
/*
* DoStuff
*/
void DoStuff(void)
{
timedout = 1;
printf("Timer went off.\n");
}
Now I want to optimize and reduce the CPU usage and want to improve the performance.
Can any one help me on this issue ?
Thanks for your help on this.
Currently I'm polling the register to get the expected value [...]
wow wow wow, hold on a moment here, there is a huge story hidden behind this sentence; what is "the register"? what is "the expected value"? What does read_reg() do? are you polling some external hardware? Well then, it all depends on how your hardware behaves.
There are two possibilities:
Your hardware buffers the values that it produces. This means that the hardware will keep each value available until you read it; it will detect when you have read the value, and then it will provide the next value.
Your hardware does not buffer values. This means that values are being made available in real time, for an unknown length of time each, and they are replaced by new values at a rate that only your hardware knows.
If your hardware is buffering, then you do not need to be afraid that some values might be lost, so there is no need to poll at all: just try reading the next value once and only once, and if it is not what you expect, sleep for a while. Each value will be there when you get around to reading it.
If your hardware is not buffering, then there is no strategy of polling and sleeping that will work for you. Your hardware must provide an interrupt, and you must write an interrupt-handling routine that will read every single new value as quickly as possible from the moment that it has been made available.
Here are some pseudo code that might help:
do
{
// Pseudo code
start_time = get_current_time();
do
{
temp_reg[0] = read_reg();
//Read the register here and copy the value into char array (temp_reg
if (timedout == 1 )
return -1;//Timedout
// Pseudo code
stop_time = get_current_time();
if (stop_time - start_time > some_limit) break;
} while (temp_reg[0] != 0 );
if (temp_reg[0] != 0)
{
usleep(some_time);
start_time = get_current_time();
}
} while (temp_reg[0] != 0 );
To turn the pseudo code into real code, see https://stackoverflow.com/a/2150334/4386427

Audio samples producer multiple threads OSX

This question is a follow-up to a former question (Audio producer threads with OSX AudioComponent consumer thread and callback in C), including a test example, which works and behaves as expected but does not quite answer the question. I have substantially rephrased the question, and re-coded the example, so that it only contains plain-C code. (I've found out that few Objective-C portions of code in the former example only caused confusion and distracted the reader from what's essential in the question.)
In order to take advantage of multiple processor cores as well as to make the CoreAudio pull-model render thread as lightweight as possible, the LPCM samples' producer routine clearly has to "sit" on a different thread, outside the real-lime-priority render thread/callback. It must feed the samples to a circular buffer (TPCircularBuffer in this example), from which the system would schedule data pull-out in quants of inNumberFrames.
The Grand Central Dispatch API offers a simple solution, which I've deduced upon some individual research (including trial-and-error coding). This solution is elegant, since it doesn't block anything nor conflict between push and pull models. Yet the GCD, which is supposed to take care of "sub-threading" does not by far meet the specific parallelization requirements for the work threads of the producer code, so I had to explicitely spawn a number of POSIX threads, depending on the number of logical cores available. Although results are already remarkable in terms of speeding-up the computation I still feel a bit unconfortable mixing the POSIX and GCD. In particular it goes for the variable wait_interval, and computing it properly, not by predicting how many PCM samples may the render thread require for the next cycle.
Here's the shortened and simplified (pseudo)code for my test program, in plain-C.
Controller declaration:
#include "TPCircularBuffer.h"
#include <AudioToolbox/AudioToolbox.h>
#include <AudioUnit/AudioUnit.h>
#include <dispatch/dispatch.h>
#include <sys/sysctl.h>
#include <pthread.h>
typedef struct {
TPCircularBuffer buffer;
AudioComponentInstance toneUnit;
Float64 sampleRate;
AudioStreamBasicDescription streamFormat;
Float32* f; //array of updated frequencies
Float32* a; //array of updated amps
Float32* prevf; //array of prev. frequencies
Float32* preva; //array of prev. amps
Float32* val;
int* arg;
int* previous_arg;
UInt32 frames;
int state;
Boolean midif; //wip
} MyAudioController;
MyAudioController gen;
dispatch_semaphore_t mSemaphore;
Boolean multithreading, NF;
typedef struct data{
int tid;
int cpuCount;
}data;
Controller management:
void setup (void){
// Initialize circular buffer
TPCircularBufferInit(&(self->buffer), kBufferLength);
// Create the semaphore
mSemaphore = dispatch_semaphore_create(0);
// Setup audio
createToneUnit(&gen);
}
void dealloc (void) {
// Release buffer resources
TPCircularBufferCleanup(&buffer);
// Clean up semaphore
dispatch_release(mSemaphore);
// dispose of audio
if(gen.toneUnit){
AudioOutputUnitStop(gen.toneUnit);
AudioUnitUninitialize(gen.toneUnit);
AudioComponentInstanceDispose(gen.toneUnit);
}
}
Dispatcher call (launching producer queue from the main thread):
void dproducer (Boolean on, Boolean multithreading, Boolean NF)
{
if (on == true)
{
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0), ^{
if((multithreading)||(NF))
producerSum(on);
else
producer(on);
});
}
return;
}
Threadable producer routine:
void producerSum(Boolean on)
{
int rc;
int num = getCPUnum();
pthread_t threads[num];
data thread_args[num];
void* resulT;
static Float32 frames [FR_MAX];
Float32 wait_interval;
int bytesToCopy;
Float32 floatmax;
while(on){
wait_interval = FACT*(gen.frames)/(gen.sampleRate);
Float32 damp = 1./(Float32)(gen.frames);
bytesToCopy = gen.frames*sizeof(Float32);
memset(frames, 0, FR_MAX*sizeof(Float32));
availableBytes = 0;
fbuffW = (Float32**)calloc(num + 1, sizeof(Float32*));
for (int i=0; i<num; ++i)
{
fbuffW[i] = (Float32*)calloc(gen.frames, sizeof(Float32));
thread_args[i].tid = i;
thread_args[i].cpuCount = num;
rc = pthread_create(&threads[i], NULL, producerTN, (void *) &thread_args[i]);
}
for (int i=0; i<num; ++i) rc = pthread_join(threads[i], &resulT);
for(UInt32 samp = 0; samp < gen.frames; samp++)
for(int i = 0; i < num; i++)
frames[samp] += fbuffW[i][samp];
//code for managing producer state and GUI updates
{ ... }
float *head = TPCircularBufferHead(&(gen.buffer), &availableBytes);
memcpy(head,(const void*)frames,MIN(bytesToCopy, availableBytes));//copies frames to head
TPCircularBufferProduce(&(gen.buffer),MIN(bytesToCopy,availableBytes));
dispatch_semaphore_wait(mSemaphore, dispatch_time(DISPATCH_TIME_NOW, wait_interval * NSEC_PER_SEC));
if(gen.state == stopped){gen.state = idle; on = false;}
for(int i = 0; i <= num; i++)
free(fbuffW[i]);
free(fbuffW);
}
return;
}
A single producer thread may look somewhat like this:
void *producerT (void *TN)
{
Float32 samples[FR_MAX];
data threadData;
threadData = *((data *)TN);
int tid = threadData.tid;
int step = threadData.cpuCount;
int *ret = calloc(1,sizeof(int));
do_something(tid, step, &samples);
{ … }
return (void*)ret;
}
Here is the render callback (CoreAudio real-time consumer thread):
static OSStatus audioRenderCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
MyAudioController *THIS = (MyAudioController *)inRefCon;
// An event happens in the render thread- signal whoever is waiting
if (THIS->state == active) dispatch_semaphore_signal(mSemaphore);
// Mono audio rendering: we only need one target buffer
const int channel = 0;
Float32* targetBuffer = (Float32 *)ioData->mBuffers[channel].mData;
memset(targetBuffer,0,inNumberFrames*sizeof(Float32));
// Pull samples from circular buffer
int32_t availableBytes;
Float32 *buffer = TPCircularBufferTail(&THIS->buffer, &availableBytes);
//copy circularBuffer content to target buffer
int bytesToCopy = ioData->mBuffers[channel].mDataByteSize;
memcpy(targetBuffer, buffer, MIN(bytesToCopy, availableBytes));
{ … };
TPCircularBufferConsume(&THIS->buffer, availableBytes);
THIS->frames = inNumberFrames;
return noErr;
}
Grand Central Dispatch already takes care of dispatching operations to multiple processor cores and threads. In typical real-time audio rendering or processing, one never needs to wait on a signal or semaphore, as the circular buffer consumption rate is very predictable, and drifts extremely slowly over time. The AVAudioSession API (if available) and Audio Unit API and callback allow you to set and determine the callback buffer size, and thus the maximum rate at which the circular buffer can change. Thus you can dispatch all render operations on a timer, render the exact number needed per timer period, and let the buffer size and state compensate for any jitter in thread dispatch time.
In extremely long running audio renders, you might want to measure the drift between timer operations and real-time audio consumption (sample rate), and tweak the number of samples rendered or the timer offset.

User-threaded scheduling API on mac OSX using ucontext & signals

I'm designing a scheduling algorithm that has the following features:
Have 2 user-threads (contexts) in the one process (I'm supposed to do 3 threads but that didn't work on osx yet, so I decided to make 2 work for now)
preemptive using a SIGALRM signal that goes off every 1 sec and changes the control from one context to another, and save the current state (registers and current position) of the context that was running before doing the switch.
what I have noticed is the following:
ucontext.h library behaves strange on mac osx whereas when it is applied in Linux it behaves exactly the way it is supposed to (the example from this man link: http://man7.org/linux/man-pages/man3/makecontext.3.html works perfectly as it is supposed to on linux whereas on mac it fails with Segmentation fault before it does any swapping). I have to make it run on osx unfortunately and not linux.
I managed to work around the swapcontext error on osx by using getcontext() & then setcontext() to do the swapping of contexts.
In my signal handler function, I use the sa_sigaction( int sig, siginfo_t *s, void * cntxt ) since the 3rd variable once re-casted it as a ucontext_t pointer is the information about the context that was interrupted (which is true on Linux once I tested it) but on mac it doesn't point to the proper location as when I use it I get a segmentation fault yet again.
i have designed my test functions for each context to be looping inside a while loop as I want to interrupt them and make sure they go back to execute at the proper location within that function. i have defined a static global count variable that helps me see whether I was in the proper user-thread or not.
One last note is that I found out that calling getcontext() inside my while loop with in the test functions updates the position of my current context constantly since it is am empty while loop and therefore calling setcontext() when that context's time comes makes it execute from proper place. This solution is redundant since these functions will be provided from outside the API.
#include <stdio.h>
#include <sys/ucontext.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <errno.h>
/*****************************************************************************/
/* time-utility */
/*****************************************************************************/
#include <sys/time.h> // struct timeval
void timeval_add_s( struct timeval *tv, uint64_t s ) {
tv->tv_sec += s;
}
void timeval_diff( struct timeval *c, struct timeval *a, struct timeval *b ) {
// use signed variables
long aa;
long bb;
long cc;
aa = a->tv_sec;
bb = b->tv_sec;
cc = aa - bb;
cc = cc < 0 ? -cc : cc;
c->tv_sec = cc;
aa = a->tv_usec;
bb = b->tv_usec;
cc = aa - bb;
cc = cc < 0 ? -cc : cc;
c->tv_usec = cc;
out:
return;
}
/******************************************************************************/
/* Variables */
/*****************************************************************************/
static int count;
/* For now only the T1 & T2 are used */
static ucontext_t T1, T2, T3, Main, Main_2;
ucontext_t *ready_queue[ 4 ] = { &T1, &T2, &T3, &Main_2 };
static int thread_count;
static int current_thread;
/* timer struct */
static struct itimerval a;
static struct timeval now, then;
/* SIGALRM struct */
static struct sigaction sa;
#define USER_THREAD_SWICTH_TIME 1
static int check;
/******************************************************************************/
/* signals */
/*****************************************************************************/
void handle_schedule( int sig, siginfo_t *s, void * cntxt ) {
ucontext_t * temp_current = (ucontext_t *) cntxt;
if( check == 0 ) {
check = 1;
printf("We were in main context user-thread\n");
} else {
ready_queue[ current_thread - 1 ] = temp_current;
printf("We were in User-Thread # %d\n", count );
}
if( current_thread == thread_count ) {
current_thread = 0;
}
printf("---------------------------X---------------------------\n");
setcontext( ready_queue[ current_thread++ ] );
out:
return;
}
/* initializes the signal handler for SIGALARM, sets all the values for the alarm */
static void start_init( void ) {
int r;
sa.sa_sigaction = handle_schedule;
sigemptyset( &sa.sa_mask );
sa.sa_flags = SA_SIGINFO;
r = sigaction( SIGALRM, &sa, NULL );
if( r == -1 ) {
printf("Error: cannot handle SIGALARM\n");
goto out;
}
gettimeofday( &now, NULL );
timeval_diff( &( a.it_value ), &now, &then );
timeval_add_s( &( a.it_interval ), USER_THREAD_SWICTH_TIME );
setitimer( ITIMER_REAL, &a, NULL );
out:
return;
}
/******************************************************************************/
/* Thread Init */
/*****************************************************************************/
static void thread_create( void * task_func(void), int arg_num, int task_arg ) {
ucontext_t* thread_temp = ready_queue[ thread_count ];
getcontext( thread_temp );
thread_temp->uc_link = NULL;
thread_temp->uc_stack.ss_size = SIGSTKSZ;
thread_temp->uc_stack.ss_sp = malloc( SIGSTKSZ );
thread_temp->uc_stack.ss_flags = 0;
if( arg_num == 0 ) {
makecontext( thread_temp, task_func, arg_num );
} else {
makecontext( thread_temp, task_func, arg_num, task_arg );
}
thread_count++;
out:
return;
}
/******************************************************************************/
/* Testing Functions */
/*****************************************************************************/
void thread_funct( int i ) {
printf( "---------------------------------This is User-Thread #%d--------------------------------\n", i );
while(1) { count = i;} //getcontext( ready_queue[ 0 ] );}
out:
return;
}
void thread_funct_2( int i ) {
printf( "---------------------------------This is User-Thread #%d--------------------------------\n", i );
while(1) { count = i;} //getcontext( ready_queue[ 1 ] ); }
out:
return;
}
/******************************************************************************/
/* Main Functions */
/*****************************************************************************/
int main( void ) {
int r;
gettimeofday( &then, NULL );
thread_create( (void *)thread_funct, 1, 1);
thread_create( (void *)thread_funct_2, 1, 2);
start_init();
while(1);
printf( "completed\n" );
out:
return 0;
}
What am I doing wrong here? I have to change this around a bit to run it on Linux properly & running the version that works on Linux on OSX causes segmentation fault, but why would it work on that OS and not this?
Is this related by any chance to my stack size i allocate in each context?
Am I supposed to have a stack space allocated for my signal? (It says that if I don't then it uses a default stack, and if I do it doesn't really make a difference)?
If the use of ucontext will never give predictable behavior on mac osx, then what is the alternative to implement user-threading on osx? I tried using tmrjump & longjmp but I run into the same issue which is when a context is interrupted in the middle of executing certain function then how can I get the exact position of where that context got interrupted in order to continue where I left off next time?
So after days of testing and debugging I finally got this. I had to dig deep into the implementation of the ucontext.h and found differences between the 2 OS. Turns out that OSX implementation of ucontext.h is different from that of Linux. For instance the mcontext_t struct within ucontext_t struct which n=usually holds the values of the registers (PI, SP, BP, general registers...) of each context is declared as a pointer in OSX whereas on Linux it is not. A couple of other differences that needed top be set specially the context's stack pointer (rsp) register, the base pointer (rbp) register, the instruction pointer (rip) register, the destination index (rdi) register... All these had to be set correctly at the beginining/creation of each context as well as after it returns for the first time. I also had top create a mcontext struct to hold these registers and have my ucontext_t struct's uc_mcontext pointer point to it. After all that was done I was able to use the ucontext_t pointer that was passed as an argument in the sa_sigaction signal handler function (after I recast it to ucontext_t) in order to resume exactly where the context left off last time. Bottom line it was a messy affair. Anyone interested in more details can msg me. JJ out.

Resources