I wanted to know how to implement my own threading library.
What I have is a CPU (PowerPC architecture) and the C Standard Library.
Is there an open source light-weight implementation I can look at?
At its very simplest a thread will need:
Some memory for stack space
Somewhere to store its context (ie. register contents, program counter, stack pointer, etc.)
On top of that you will need to implement a simple "kernel" that will be responsible for the thread switching. And if you're trying to implement pre-emptive threading then you'll also need a periodic source of interrupts. eg. a timer. In this case you can execute your thread switching code in the timer interrupt.
Take a look at the setjmp()/longjmp() routines, and the corresponding jmp_buf structure. This will give you easy access to the stack pointer so that you can assign your own stack space, and will give you a simple way of capturing all of the register contents to provide your thread's context.
Typically the longjmp() function is a wrapper for a return from interrupt instruction, which fits very nicely with having thread scheduling functionality in the timer interrupt. You will need to check the implementation of longjmp() and jmp_buf for your platform though.
Try looking for thread implementations on smaller microprocessors, which typically don't have OS's. eg. Atmel AVR, or Microchip PIC.
For example : discussion on AVRFreaks
For a decent thread library you need:
atomic operations to avoid races (to implement e.g a mutex)
some OS support to do the scheduling and to avoid busy waiting
some OS support to implement context switching
All three leave the scope of what C99 offers you. Atomic operations are introduced in C11, up to now C11 implementations don't seem to be ready, so these are usually implemented in assembler. For the later two, you'd have to rely on your OS.
Maybe you could look at C++ which has threading support. I'd start by picking some of their most useful primitives (for example futures), see how they work, and do a simple implementation.
Related
I'm trying to implement a communication protocol in C. I need to implement a timer (so that if after some time an ACK has not been received yet, the sender will assume the packet has been lost and will send it again).
In a C-looking-pseudocode I would like to have something like this:
if (!ack_received(seqn) && timer_expired(seqn)) {
send_packet(seqn);
start_timer(seqn);
}
Note: seqn is the sequence number of the packet being sent. Each packet needs a personal timer.
How to implement timer_expired and start_timer? Is there a way to do it without using several threads?
Can I implement a single threaded timer in C?
Probably not in pure portable C99 (or single-threaded C11, see n1570).
But in practice, you'll often code for some operating system, and you'll then get some ways to have timers. On Linux, read time(7) first. You'll probably also want to use a multiplexing call such as poll(2) (to which you give a delay). And learn more about other system calls, so read intro(2), syscalls(2) and some good Linux programming book (perhaps the old ALP, freely downloadable).
BTW, it seems that you are coding something network related. You practically need some API for that (e.g. Berkeley sockets), hence you'll probably use something similar to an OS.
Many event loops are single-threaded but are providing some kind of timers.
Or perhaps (if you don't have any OS) you are coding some freestanding C for some small embedded hardware platform (e.g. Arduino-like). Then you have some ways to poll network inputs and setup timers.
depends of the architecture of your system it can be done more or less elegant way.
In the simple program with a single thread just declare the table containing starting timestamps. So the function will just check the difference between the current timestamp, the saved one and the timeout value. You need to implement of course an another function which will initialize the table element for the particular timeout counter.
I'm wondering if it's possible to implement preemptive multitasking of native code within a single process in user space on Linux. (That is, externally pause some running native code, save the context, swap in a different context, and resume execution, all orchestrated by user space but using calls that may enter the kernel.) I was thinking this could be done using a signal handler for SIGALRM, and the *context() family but it turns out that the entire *context() family is async-signal-unsafe so that approach isn't guaranteed to work. I did find a gist that implements this idea so apparently it does happen to work on Linux, at least sometimes, even though by POSIX it's not required to work. The gist installs this as a signal handler on SIGALRM, which makes several *context() calls:
void
timer_interrupt(int j, siginfo_t *si, void *old_context)
{
/* Create new scheduler context */
getcontext(&signal_context);
signal_context.uc_stack.ss_sp = signal_stack;
signal_context.uc_stack.ss_size = STACKSIZE;
signal_context.uc_stack.ss_flags = 0;
sigemptyset(&signal_context.uc_sigmask);
makecontext(&signal_context, scheduler, 1);
/* save running thread, jump to scheduler */
swapcontext(cur_context,&signal_context);
}
Does Linux offer any guarantee that makes this approach correct? Is there a way to make this correct? Is there a totally different way to do this correctly?
(By "implement in user space" I don't mean that we never enter the kernel. I mean to contrast with the preemptive multitasking implemented by the kernel.)
You cannot reliably change contexts inside signal handlers. (if you did that from some signal handler, it would usually work in practice, but not always, hence it is undefined behavior).
You could set some volatile sig_atomic_t flag (read about sig_atomic_t) in a signal handler (see signal(7), signal-safety(7), sigreturn(2) ...) and check that flag regularly (e.g. at least once every few milliseconds) in your code, for example before most calls, or inside your event loop if you have one, etc... So it becomes cooperative user-land scheduling.
It is easier to do if you can change the code, e.g. when you design some compiler which emits C code (a common practice), or if you hack your C compiler to emit such tests. Then you'll change your code generator to sometimes emit such a test in the generated code.
You may want to forbid blocking system calls and replace them with non-blocking variants or wrappers. See also poll(2), fcntl(2) with F_SETFL and O_NONBLOCK, etc...
You may want the code generator to avoid large call stacks, e.g. like GCC's -fsplit-stack instrumentation option does (read about splitstacks in GCC).
And if you generate (or write some) assembler, you can use such tricks. AFAIK the Go compiler uses something similar for its goroutines. Study your ABI, e.g. from here.
However, kernel initiated preemptive scheduling is preferable (and on Linux will still happen between processes or kernel tasks, see clone(2)).
PS. If garbage collection techniques using similar tricks interest you, look into MPS and Cheney on the MTA (e.g. into Chicken Scheme).
BACKGROUND
I'm integrating micropython into my custom cooperative multitasking OS (no, my company won't change to pre-preemptive)
Micropython uses garbage collection and this takes much more time than my alloted time slice even when there's nothing to collect i.e. I called it twice in a row, timed it and still takes A LOT of time.
OBVIOUS SOLUTION
Yes I could refactor micropython source but then whenever there's a change . . .
IDEAL SOLUTION
The ideal solution would involve calling some function void pause(&func_in_call_stack) that would jump out, leaving the stack intact, all the way to the function that is at the top of the call stack, say main. And resume would . . . resume.
QUESTION
Is it possible, using C and assembly, to implement pause?
UPDATE
As I wrote this, I realize that the C-based exception handling code nlr_push()/nlr_pop() already does most of what I need.
Your question is about implementing context switching. As we've covered fairly exhaustively in comments, support for context switching is among the key characteristics of any multitasking system, and of a multitasking OS in particular. Inasmuch as you posit no OS support for context switching, you are talking about implementing multitasking for a single-tasking OS.
That you describe the OS as providing some kind of task queue ("to relinquish control, a thread must simply exit its run loop") does not change this, though to some extent we could consider it a question of semantics. I imagine that a typical task for such a system would operate by creating and executing a series of microtasks (the work of the "run loop"), providing a shared, mutable memory context to each. Such a run loop could safely exit and later be reentered, to resume generating microtasks from where it left off.
Dividing tasks into microtasks at boundaries defined by affirmative application action (i.e. your pause()) would depend on capabilities beyond those provided by ISO C. Very likely, however, it could be done with the help of some assembly, plus some kind of framework support. You need at least these things:
A mechanism for recording a task's current execution context -- stack, register contents, and maybe other details. This is inherently system-specific.
A task-associated place to store recorded execution context. There are various ways in which such a thing could be established. Promising alternatives include (i) provided by the OS; (ii) provided by some kind of userland multi-tasking system running on top of the OS; (iii) built into the task by the compiler.
A mechanism for restoring recorded execution context -- this, too, will be system-specific.
If the OS does not provide such features, then you could consider the (now removed) POSIX context system as a model interface for recording and restoring execution context. (See makecontext(), swapcontext(), getcontext(), and setcontext().) You would need to implement those yourself, however, and you might want to wrap them to present a simpler interface to applications. Details will be highly dependent on hardware and underlying OS.
As an alternative, you might implement transparent multitasking support for such a system by providing compilers that emit specially instrumented code (i.e. even more specially instrumented than you otherwise need). For example, consider compilers that emit bytecode for a VM of your own design. The VMs in which the resulting programs run would naturally track the state of the program running within, and could yield after each sequence of a certain number of opcodes.
An effective and necessary implementation of semaphore requires it to be atomic instruction.
I see several User level C implementations on the internet implementing semaphores using variables like count or a data structure like queue. But, the instructions involving variable donot run as atomic instructions. So how can anyone implement a sempahore in User Level C.
How does a c library semaphore.h implement semaphore?
The answer is almost certainly "it doesn't" - instead it will call into kernel services which provide the necessary atomic operations.
It's not possible in standard C until c11. What you need is, as you said, atomic operations. c11 finally specifies them, see for example stdatomic.h.
If you're on an older version of the standard, you have to either use embedded assembler directly or rely on vendor-specific extensions of your compiler, see for example the GCC atomic builtins. Of course, processors support instructions for memory barriers, check and swap operations etc. They're just not accessible from pure c99 and earlier because parallel execution wasn't in the scope of the standard.
After reading MartinJames' comment, I should add clarification here: This only applies if you implement all your threading in user space because a semaphore must block threads waiting on it, so if the threads are managed by the kernel's scheduler (as is the case with pthreads on Linux for example), it's necessary to do a syscall. Not in the scope of your question, but atomic operations might still be interesting for implementing e.g. lock-free datastructures.
You could implement semaphore operations as simple as:
void sema_post(atomic_uint *value) {
unsigned old = 0;
while (!atomic_compare_exchange_weak(value, &old, old + 1));
}
void sema_wait(atomic_uint *value) {
unsigned old = 1;
while (old == 0 || !atomic_compare_exchange_weak(value, &old, old - 1));
}
It's OK semantically, but it does busy waiting (spinning) in sema_wait. (Note that sema_post is lock-free, although it also may spin.) Instead it should sleep until value becomes positive. This problem cannot be solved with atomics because all atomic operations are non-blocking. Here you need help from OS kernel. So an efficient semaphore could use similar algorithm based on atomics but go into kernel in two cases (see Linux futex for more details on this approach):
sema_wait: when it finds value == 0, ask to sleep
sema_post: when it has incremented value from 0 to 1, ask to wake another sleeping thread if any
In general, to implement a lock-free (using atomics) operations on a data structure it's required that every operation is applicable to any state. For semaphores, wait isn't applicable to value 0.
I'm looking for hints in using dynamic memory handler safe in multi-threaded system. Details of the issue:
written in C will run on cortex-M3 processor, with RTOS (CooCox OS),
TLSF memory allocator will be used (other allocators might be used if I will find them better suited and they will be free and open-source),
Solution I'm looking for is using memory allocator safe from OS tasks and interrupts.
So far thought of 2 possible approaches, both have few yet unknown for me details:
disable and enable interrupts when calling allocator functions. Problem - if I'm not mistaking I can't play with interrupts disable and enable in normal mode, only in privileged mode (so if I'm not mistaken, that is only in interrupts), I need to do that from runtime also - to prevent interupts and task switching during memory handler operations.
call allocator from SWI. This one is still very unclear for me. 1st - is SWI same as FIQ (if so is it true that FIQ code needs to be written in asm - since allocator is written in C). Then still have few doubts about calling FIQ from IRQ (that scenarion would happen - tho not often), but most likely this part will not cause issues.
So any ideas on possible solutions for this situation?
Regarding your suggestions 1 and 2:
On Cortex-M3 you can enable and disable interrupts at any time in privileged level code through the CMSIS intrinsics __disable_irq/_enable_irq functions. privileged level is not restricted to handler mode; thread mode code can run at privileged level too (and in many small RTOS that is the default).
SWI and FIQ are concepts from legacy ARM architectures. They do not exist in Cortex-M3.
You would not ideally want to perform memory allocation in an interrupt handler - even if the allocator is deterministic, it may still take significant amount of time; I can think of few reasons you would want to do that.
The best approach is to modify the tlsf code to use an RTOS mutex for each of the calls with external linkage. Other libraries I have used have stubs already in the library that normally do nothing, but which you can override with your own implementation to map it to any RTOS.
Now you cannot of course use a mutex in an ISR, but as I said you should probably not allocate memory there either. If you really must perform allocation in an interrupt handler, then enable/disable interrupts is your only option, but you are then confounding all the real-time deterministic behaviour that an RTOS provides. A better solution to that is to have your ISR do not more than issue an event-flag or semaphore to a thread context handler. This allows you to use all RTOS services and scheduling, and the context switch time from ISR to a high priority thread will be insignificant compared to the memory allocation time.
Another possibility would be to not use this allocator at all, but instead use a fixed-block allocator using RTOS queues. You pre-allocate blocks of memory (statically or dynamically), post pointers to the start of each block onto a queue, then to allocate you simply receive a pointer from the queue, and to free you post back to the queue. If memory is exhausted (queue is empty), you can baulk or block on the queue (do not block in an ISR though). You can create multiple queues for different sized blocks, and use the one appropriate to your needs (ensuring you post back to the same queue of course!)