User-threaded scheduling API on mac OSX using ucontext & signals - c

I'm designing a scheduling algorithm that has the following features:
Have 2 user-threads (contexts) in the one process (I'm supposed to do 3 threads but that didn't work on osx yet, so I decided to make 2 work for now)
preemptive using a SIGALRM signal that goes off every 1 sec and changes the control from one context to another, and save the current state (registers and current position) of the context that was running before doing the switch.
what I have noticed is the following:
ucontext.h library behaves strange on mac osx whereas when it is applied in Linux it behaves exactly the way it is supposed to (the example from this man link: works perfectly as it is supposed to on linux whereas on mac it fails with Segmentation fault before it does any swapping). I have to make it run on osx unfortunately and not linux.
I managed to work around the swapcontext error on osx by using getcontext() & then setcontext() to do the swapping of contexts.
In my signal handler function, I use the sa_sigaction( int sig, siginfo_t *s, void * cntxt ) since the 3rd variable once re-casted it as a ucontext_t pointer is the information about the context that was interrupted (which is true on Linux once I tested it) but on mac it doesn't point to the proper location as when I use it I get a segmentation fault yet again.
i have designed my test functions for each context to be looping inside a while loop as I want to interrupt them and make sure they go back to execute at the proper location within that function. i have defined a static global count variable that helps me see whether I was in the proper user-thread or not.
One last note is that I found out that calling getcontext() inside my while loop with in the test functions updates the position of my current context constantly since it is am empty while loop and therefore calling setcontext() when that context's time comes makes it execute from proper place. This solution is redundant since these functions will be provided from outside the API.
#include <stdio.h>
#include <sys/ucontext.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <errno.h>
/* time-utility */
#include <sys/time.h> // struct timeval
void timeval_add_s( struct timeval *tv, uint64_t s ) {
tv->tv_sec += s;
void timeval_diff( struct timeval *c, struct timeval *a, struct timeval *b ) {
// use signed variables
long aa;
long bb;
long cc;
aa = a->tv_sec;
bb = b->tv_sec;
cc = aa - bb;
cc = cc < 0 ? -cc : cc;
c->tv_sec = cc;
aa = a->tv_usec;
bb = b->tv_usec;
cc = aa - bb;
cc = cc < 0 ? -cc : cc;
c->tv_usec = cc;
/* Variables */
static int count;
/* For now only the T1 & T2 are used */
static ucontext_t T1, T2, T3, Main, Main_2;
ucontext_t *ready_queue[ 4 ] = { &T1, &T2, &T3, &Main_2 };
static int thread_count;
static int current_thread;
/* timer struct */
static struct itimerval a;
static struct timeval now, then;
/* SIGALRM struct */
static struct sigaction sa;
static int check;
/* signals */
void handle_schedule( int sig, siginfo_t *s, void * cntxt ) {
ucontext_t * temp_current = (ucontext_t *) cntxt;
if( check == 0 ) {
check = 1;
printf("We were in main context user-thread\n");
} else {
ready_queue[ current_thread - 1 ] = temp_current;
printf("We were in User-Thread # %d\n", count );
if( current_thread == thread_count ) {
current_thread = 0;
setcontext( ready_queue[ current_thread++ ] );
/* initializes the signal handler for SIGALARM, sets all the values for the alarm */
static void start_init( void ) {
int r;
sa.sa_sigaction = handle_schedule;
sigemptyset( &sa.sa_mask );
sa.sa_flags = SA_SIGINFO;
r = sigaction( SIGALRM, &sa, NULL );
if( r == -1 ) {
printf("Error: cannot handle SIGALARM\n");
goto out;
gettimeofday( &now, NULL );
timeval_diff( &( a.it_value ), &now, &then );
timeval_add_s( &( a.it_interval ), USER_THREAD_SWICTH_TIME );
setitimer( ITIMER_REAL, &a, NULL );
/* Thread Init */
static void thread_create( void * task_func(void), int arg_num, int task_arg ) {
ucontext_t* thread_temp = ready_queue[ thread_count ];
getcontext( thread_temp );
thread_temp->uc_link = NULL;
thread_temp->uc_stack.ss_size = SIGSTKSZ;
thread_temp->uc_stack.ss_sp = malloc( SIGSTKSZ );
thread_temp->uc_stack.ss_flags = 0;
if( arg_num == 0 ) {
makecontext( thread_temp, task_func, arg_num );
} else {
makecontext( thread_temp, task_func, arg_num, task_arg );
/* Testing Functions */
void thread_funct( int i ) {
printf( "---------------------------------This is User-Thread #%d--------------------------------\n", i );
while(1) { count = i;} //getcontext( ready_queue[ 0 ] );}
void thread_funct_2( int i ) {
printf( "---------------------------------This is User-Thread #%d--------------------------------\n", i );
while(1) { count = i;} //getcontext( ready_queue[ 1 ] ); }
/* Main Functions */
int main( void ) {
int r;
gettimeofday( &then, NULL );
thread_create( (void *)thread_funct, 1, 1);
thread_create( (void *)thread_funct_2, 1, 2);
printf( "completed\n" );
return 0;
What am I doing wrong here? I have to change this around a bit to run it on Linux properly & running the version that works on Linux on OSX causes segmentation fault, but why would it work on that OS and not this?
Is this related by any chance to my stack size i allocate in each context?
Am I supposed to have a stack space allocated for my signal? (It says that if I don't then it uses a default stack, and if I do it doesn't really make a difference)?
If the use of ucontext will never give predictable behavior on mac osx, then what is the alternative to implement user-threading on osx? I tried using tmrjump & longjmp but I run into the same issue which is when a context is interrupted in the middle of executing certain function then how can I get the exact position of where that context got interrupted in order to continue where I left off next time?

So after days of testing and debugging I finally got this. I had to dig deep into the implementation of the ucontext.h and found differences between the 2 OS. Turns out that OSX implementation of ucontext.h is different from that of Linux. For instance the mcontext_t struct within ucontext_t struct which n=usually holds the values of the registers (PI, SP, BP, general registers...) of each context is declared as a pointer in OSX whereas on Linux it is not. A couple of other differences that needed top be set specially the context's stack pointer (rsp) register, the base pointer (rbp) register, the instruction pointer (rip) register, the destination index (rdi) register... All these had to be set correctly at the beginining/creation of each context as well as after it returns for the first time. I also had top create a mcontext struct to hold these registers and have my ucontext_t struct's uc_mcontext pointer point to it. After all that was done I was able to use the ucontext_t pointer that was passed as an argument in the sa_sigaction signal handler function (after I recast it to ucontext_t) in order to resume exactly where the context left off last time. Bottom line it was a messy affair. Anyone interested in more details can msg me. JJ out.


What kinds of data in a device driver can be shared among processes?

In device drivers, how can we tell what data is shared among processes and what is local to a process? The Linux Device Drivers book mentions
Any time that a hardware or software resource is shared beyond a single thread of execution, and the possibility exists that one thread could encounter an inconsistent view of that resource, you must explicitly manage access to that resource.
But what kinds of software resources can be shared among threads and what kinds of data cannot be shared? I know that global variables are generally considered as shared memory but what other kinds of things need to be protected?
For example, is the struct inode and struct file types passed in file operations like open, release, read, write, etc. considered to be shared?
In the open call inside main.c , why is dev (in the line dev = container_of(inode->i_cdev, struct scull_dev, cdev);) not protected with a lock if it points to a struct scull_dev entry in the global array scull_devices?
In scull_write, why isn't the line int quantum = dev->quantum, qset = dev->qset; locked with a semaphore since it's accessing a global variable?
/* In scull.h */
struct scull_qset {
void **data; /* pointer to an array of pointers which each point to a quantum buffer */
struct scull_qset *next;
struct scull_dev {
struct scull_qset *data; /* Pointer to first quantum set */
int quantum; /* the current quantum size */
int qset; /* the current array size */
unsigned long size; /* amount of data stored here */
unsigned int access_key; /* used by sculluid and scullpriv */
struct semaphore sem; /* mutual exclusion semaphore */
struct cdev cdev; /* Char device structure */
/* In main.c */
struct scull_dev *scull_devices; /* allocated in scull_init_module */
int scull_major = SCULL_MAJOR;
int scull_minor = 0;
int scull_nr_devs = SCULL_NR_DEVS;
int scull_quantum = SCULL_QUANTUM;
int scull_qset = SCULL_QSET;
ssize_t scull_write(struct file *filp, const char __user *buf, size_t count,
loff_t *f_pos)
struct scull_dev *dev = filp->private_data; /* flip->private_data assigned in scull_open */
struct scull_qset *dptr;
int quantum = dev->quantum, qset = dev->qset;
int itemsize = quantum * qset;
int item; /* item in linked list */
int s_pos; /* position in qset data array */
int q_pos; /* position in quantum */
int rest;
ssize_t retval = -ENOMEM; /* value used in "goto out" statements */
if (down_interruptible(&dev->sem))
/* find listitem, qset index and offset in the quantum */
item = (long)*f_pos / itemsize;
rest = (long)*f_pos % itemsize;
s_pos = rest / quantum;
q_pos = rest % quantum;
/* follow the list up to the right position */
dptr = scull_follow(dev, item);
if (dptr == NULL)
goto out;
if (!dptr->data) {
dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL);
if (!dptr->data)
goto out;
memset(dptr->data, 0, qset * sizeof(char *));
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos])
goto out;
/* write only up to the end of this quantum */
if (count > quantum - q_pos)
count = quantum - q_pos;
if (copy_from_user(dptr->data[s_pos]+q_pos, buf, count)) {
retval = -EFAULT;
goto out;
*f_pos += count;
retval = count;
/* update the size */
if (dev->size < *f_pos)
dev->size = *f_pos;
return retval;
int scull_open(struct inode *inode, struct file *filp)
struct scull_dev *dev; /* device information */
/* Question: Why was the lock not placed here? */
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
filp->private_data = dev; /* for other methods */
/* now trim to 0 the length of the device if open was write-only */
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
if (down_interruptible(&dev->sem))
scull_trim(dev); /* ignore errors */
return 0; /* success */
int scull_init_module(void)
int result, i;
dev_t dev = 0;
/* assigns major and minor numbers (left out for brevity) */
* allocate the devices -- we can't have them static, as the number
* can be specified at load time
scull_devices = kmalloc(scull_nr_devs * sizeof(struct scull_dev), GFP_KERNEL);
if (!scull_devices) {
result = -ENOMEM;
goto fail; /* isn't this redundant? */
memset(scull_devices, 0, scull_nr_devs * sizeof(struct scull_dev));
/* Initialize each device. */
for (i = 0; i < scull_nr_devs; i++) {
scull_devices[i].quantum = scull_quantum;
scull_devices[i].qset = scull_qset;
scull_setup_cdev(&scull_devices[i], i);
/* some other stuff (left out for brevity) */
return 0; /* succeed */
scull_cleanup_module(); /* left out for brevity */
return result;
* Set up the char_dev structure for this device.
static void scull_setup_cdev(struct scull_dev *dev, int index)
int err, devno = MKDEV(scull_major, scull_minor + index);
cdev_init(&dev->cdev, &scull_fops);
dev->cdev.owner = THIS_MODULE;
dev->cdev.ops = &scull_fops; /* isn't this redundant? */
err = cdev_add (&dev->cdev, devno, 1);
/* Fail gracefully if need be */
if (err)
printk(KERN_NOTICE "Error %d adding scull%d", err, index);
All data in memory can be considered a "shared resource" if both threads are able to access it*. The only resource they wouldn't be shared between processors is the data in the registers, which is abstracted away in C.
There are two reasons that you would not practically consider two resources to be shared (even though they do not actually mean that two threads could not theoretically access them, some nightmarish code could sometimes bypass these).
Only one thread can/does access it. Clearly if only one thread accesses a variable then there can be no race conditions. This is the reason local variables and single threaded programs do not need locking mechanisms.
The value is constant. You can't get different results based on order of access if the value can never change.
The program you have shown here is incomplete, so it is hard to say, but each of the variables accessed without locking must meet one of the criteria for this program to be thread safe.
There are some non-obvious ways to meet the criteria, such as if a variable is constant or limited to one thread only in a specific context.
You gave two examples of lines that were not locked. For the first line.
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
This line does not actually access any variables, it just computes where the struct containing cdev would be. There can be no race conditions because nobody else has access to your pointers (though they have access to what they point to), they are only accessible within the function (this is not true of what they point to). This meets criteria (1).
The other example is
int quantum = dev->quantum, qset = dev->qset;
This one is a bit harder to say without context, but my best guess is that it is assumed that dev->quantum and dev->qset will never change during the function call. This seems supported by the fact that they are only called in scull_init_module which should only be called once at the very beginning. I believe this fits criteria (2).
Which brings up another way that you might change a shared variable without locking, if you know that other threads will not try to access it until you are done for some other reason (eg they are not extant yet)
In short, all memory is shared, but sometimes you can get away with acting like its not.
*There are embedded systems where each processor has some amount of RAM that only it could use, but this is not the typical case.

Pointer value changing across function call

I have the following structures in the kernel
struct state {
/* Current algorithm iteration */
int tune_id;
/* Thread id */
pid_t tid;
#ifndef __KERNEL__
/* Paths */
char *stats_path;
char *budget_path;
char *controller_path;
#endif /* __KERNEL__ */
int budget;
/* Stats */
struct statistics prev_stats;
struct parameters current_params;
u64 cur_time;
/* Algorithm specific data */
void *data;
struct tuning {
struct algorithm *algorithm;
struct state *state;
struct energy energy;
I've defined a function tune() as follows:
void tune(struct task_struct *task) {
struct statistics stats;
struct state *state;
state = task->tuning.state;
compute_energy(&stats, state);
The other functions are defined as:
void get_current_params(struct parameters *params)
printk(KERN_DEBUG "get_current_params: parameters:0x%X\n", (unsigned int) params);
params->cpu_frequency_MHZ = (cpufreq_get(0) + 500) / 1000;
params->mem_frequency_MHZ = (memfreq_get() + 500) / 1000;
void compute_energy(struct statistics *stats, struct state *state)
struct statistics *diffs;
struct frontier *frontier;
u64 energy_budget;
int threshold;
int i,j;
struct configuration s;
struct configuration emin;
#ifdef TIMING
u64 ns;
ns = get_thread_time();
#ifdef DEBUG
#ifdef __KERNEL__
printk(KERN_DEBUG "compute_energy: parameters:0x%X\n", (unsigned int) &state->current_params);
#endif /* __KERNEL__ */
When I call tune(), the output is as follows:
[ 7.160139] get_current_params: parameters:0xBF396BA0
[ 7.160298] compute_energy: parameters:0xBF396B98
I don't understand why the addresses differ by 0x8.
This in turn causes a divide by 0 exception in the kernel since the struct parameters seems to have values of 0 instead of what was initialized by get_current_params
Why is it that the address of the member current_params of struct state changes across function calls?
I've verified that this bug only occurs for PID 0.
Looking at include/linux/init_task.h, I see that PID 0 is statically initialized. This is the only difference I could find between PID 0 and the other tasks. Could this somehow be responsible for the issue I'm having?
For what I can see, you are right in that both addresses should be the same. So there can only be one option: task information changes in the kernel in the meanwhile.
Considering this snippet of your code:
void tune(struct task_struct *task) {
struct state *state;
state = task->tuning.state;
You are managing two structs over which you may have no control (you should check that):
(*task): struct task_struct
(*task->tuning.state): struct state
So when in tune() you call
compute_energy(&stats, state);
something could happen between both printk functions, so there is where I think you have to put your focus in.
Try saving task->tuning.state before the call to get_current_params() so you could check that it continues to be the same value after call to compute_energy().
Hope this helps.

GDB reports "No line 92 in the current file." but I actually have line 92

I encountered a very bizarre bug when I test my interrupt module of my os class project which is based on HOCA system.
When I start my main function (which is from line66 to line101), but when I set the breakpoint at line92, gdb says
No line 92 in the current file. Make breakpoint pending on future shared library load?
Do you guys know what's going on here?
Furthermore, when I set the breakpoint at line 92 and continue GDB, it reports :"
trap: nonexistant memory
address: -1
memory size: 131072
ERROR: address greater than MEMORYSIZE
Program received signal SIGSEGV, Segmentation fault.
0x0000002e in ?? ()
Source code is as follow:
/* This module coordinates the initialization of the nucleus and it starts the execution
* of the first process, p1(). It also provides a scheduling function. The module contains
* two functions: main() and init(). init() is static. It also contains a function that it
* exports: schedule().
#include "../../h/const.h"
#include "../../h/types.h"
#include "../../h/util.h"
#include "../../h/vpop.h"
#include "../../h/procq.e"
#include "../../h/asl.e"
#include "../../h/trap.h"
#include "../../h/int.h"
proc_link RQ; /* pointer to the tail of the Ready Queue */
state_t st; /* the starting state_t */
extern int p1();
/* This function determines how much physical memory there is in the system.
* It then calls initProc(), initSemd(), trapinit() and intinit().
void static init(){
if(st.s_sp%PAGESIZE != 0){
st.s_sp -= st.s_sp%PAGESIZE;
/* If the RQ is not empty this function calls intschedule() and loads the state of
* the process at the head of the RQ. If the RQ is empty it calls intdeadlock().
void schedule(){
proc_t *front;
front = headQueue(RQ);
if (checkPointer(front)) {
else {
/* This function calls init(), sets up the processor state for p1(), adds p1() to the
* RQ and calls schedule().
void main(){
proc_t *pp1; // pointer to process table entry
state_t pp1state; //process state
long curr_time; // to store the time
init(); // initialize the process table, semaphore...
/*setup the processor state for p1(), adds p1() to the ReadyQueue */ = (proc_t *) ENULL;
RQ.index = 1;
pp1 = allocProc();
pp1->parent = (proc_t *) ENULL; // ENULL is set to -1
pp1->child = (proc_t *) ENULL;
pp1->sibling_next = pp1;
pp1->sibling_prev = pp1;
pp1state.s_sp = st.s_sp - (PAGESIZE*2);
pp1state.s_pc = (int)p1;
pp1state.s_sr.ps_s = 1; // here should be line 92
STCK(&curr_time); //store the CPU time to curr_time
pp1->p_s = pp1state;
pp1->start_time = curr_time;
insertProc(&RQ, pp1);
Compile without optimizations. Use O0 gcc flag for that.

What's a good way to implement simple clone() based multithread library?

I'm trying to build simple multithread library based on linux using clone() and other kernel utilities.I've come to a point where I'm not really sure what's the correct way to do things. I tried going trough original NPTL code but it's a bit too much.
That's how for instance I imagine the create method:
typedef int sk_thr_id;
typedef void *sk_thr_arg;
typedef int (*sk_thr_func)(sk_thr_arg);
sk_thr_id sk_thr_create(sk_thr_func f, sk_thr_arg a){
void* stack;
stack = malloc( 1024*64 );
if ( stack == 0 ){
perror( "malloc: could not allocate stack" );
exit( 1 );
return ( clone(f, (char*) stack + FIBER_STACK, SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, a ) );
1: I'm not really sure what the correct clone() flags should be. I just found these being used in a simple example. Any general directions here will be welcome.
Here are parts of the mutex primitives created using futexes(not my own code for now):
#define cmpxchg(P, O, N) __sync_val_compare_and_swap((P), (O), (N))
#define cpu_relax() asm volatile("pause\n": : :"memory")
#define barrier() asm volatile("": : :"memory")
static inline unsigned xchg_32(void *ptr, unsigned x)
__asm__ __volatile__("xchgl %0,%1"
:"=r" ((unsigned) x)
:"m" (*(volatile unsigned *)ptr), "0" (x)
return x;
static inline unsigned short xchg_8(void *ptr, char x)
__asm__ __volatile__("xchgb %0,%1"
:"=r" ((char) x)
:"m" (*(volatile char *)ptr), "0" (x)
return x;
int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
typedef union mutex mutex;
union mutex
unsigned u;
unsigned char locked;
unsigned char contended;
} b;
int mutex_init(mutex *m, const pthread_mutexattr_t *a)
(void) a;
m->u = 0;
return 0;
int mutex_lock(mutex *m)
int i;
/* Try to grab lock */
for (i = 0; i < 100; i++)
if (!xchg_8(&m->b.locked, 1)) return 0;
/* Have to sleep */
while (xchg_32(&m->u, 257) & 1)
sys_futex(m, FUTEX_WAIT_PRIVATE, 257, NULL, NULL, 0);
return 0;
int mutex_unlock(mutex *m)
int i;
/* Locked and not contended */
if ((m->u == 1) && (cmpxchg(&m->u, 1, 0) == 1)) return 0;
/* Unlock */
m->b.locked = 0;
/* Spin and hope someone takes the lock */
for (i = 0; i < 200; i++)
if (m->b.locked) return 0;
/* We need to wake someone up */
m->b.contended = 0;
sys_futex(m, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
return 0;
2: The main question for me is how to implement the "join" primitive? I know it's supposed to be based on futexes too. It's a struggle for me for now to come up with something.
3: I need some way to cleanup stuff(like the allocated stack) after a thread has finished. I can't really thing of a good way to do this too.
Probably for these I'll need to have additional structure in user space for every thread with some information saved in it. Can someone point me in good direction for solving these issues?
4: I'll want to have a way to tell how much time a thread has been running, how long it's been since it's last being scheduled and other stuff like that. Are there some kernel calls providing such info?
Thanks in advance!
The idea that there can exist a "multithreading library" as a third-party library separate from the rest of the standard library is an outdated and flawed notion. If you want to do this, you'll have to first drop all use of the standard library; particularly, your call to malloc is completely unsafe if you're calling clone yourself, because:
malloc will have no idea that multiple threads exist, and therefore may fail to perform proper synchronization.
Even if it knew they existed, malloc will need to access an unspecified, implementation-specific structure located at the address given by the thread pointer. As this structure is implementation-specific, you have no way of creating such a structure that will be interpreted correctly by both the current and all future versions of your system's libc.
These issues don't apply just to malloc but to most of the standard library; even async-signal-safe functions may be unsafe to use, as they might dereference the thread pointer for cancellation-related purposes, performing optimal syscall mechanisms, etc.
If you really insist on making your own threads implementation, you'll have to abstain from using glibc or any modern libc that's integrated with threads, and instead opt for something much more naive like klibc. This could be an educational experiment, but it would not be appropriate for a deployed application.
1) You are using an example of LinuxThreads. I will not rewrite good references for directions, but I advise you "The Linux Programming interface" of Michael Kerrisk, chapter 28. It explains in 25 pages, what you need.
2) If you set the CLONE_CHILD_CLEARID flag, when the child terminates, the ctid argument of clone is cleared. If you treat that pointer as a futex, you can implement the join primitive. Good luck :-) If you don't want to use futexes, have also a look to wait3 and wait4.
3) I do not know what you want to cleanup, but you can use the clone tls arugment. This is a thread local storage buffer. If the thread is finished, you can clean that buffer.
4) See getrusage.

using UNIX Pipeline with C

I am required to create 6 threads to perform a task (increment/decrement a number) concurrently until the integer becomes 0. I am supposed to be using only UNIX commands (Pipelines to be specific) and I can't get my head around how pipelines work, or how I can implement this program.
This integer can be stored in a text file.
I would really appreciate it if anyone can explain how to implement this program
The book is right, pipes can be used to protect critical sections, although how to do so is non-obvous.
int *make_pipe_semaphore(int initial_count)
int *ptr = malloc(2 * sizeof(int));
if (pipe(ptr)) {
return NULL;
while (initial_count--)
return ptr;
void free_pipe_semaphore(int *sem)
void pipe_wait(int *sem)
char x;
read(sem[0], &x, 1);
void pipe_release(int *sem)
char x;
write(sem[1], &x, 1);
The maximum free resources in the semaphore varies from OS to OS but is usually at least 4096. This doesn't matter for protecting a critical section where the initial and maximum values are both 1.
/* Initialization section */
int *sem = make_pipe_semaphore(1);
/* critical worker */
/* do work */
/* end critical section */
