multithreaded environment in c - c

I'm just trying to get my head around multithreading environments, specifically how you would implement a cooperative one in c (on an AVR, but out of interest I would like to keep this general).
My problem comes with the thread switch itself: I'm pretty sure I could write this in assembler, flushing all the registers to a stack and then saving the PC to return to later.
How would one pull something like this off in c? I have been told it can do "everything".
I realize this is quite a general question, so any links with information on this topic would be greatly appreciated.
Thanks

You can do this with setjmp/longjmp on most systems -- here is some code I've use in the past for task switching:
void task_switch(Task *to, int exit)
{
int tmp;
int task_errno; /* save space for errno */
task_errno = errno;
if (!(tmp = setjmp(current_task->env))) {
tmp = exit ? (int)current_task : 1;
current_task = to;
longjmp(to->env, tmp); }
if (exit) {
/* if we get here, the stack pointer is pointing into an already
** freed block ! */
abort(); }
if (tmp != 1)
free((void *)tmp);
errno = task_errno;
}
This depends on sizeof(int) == sizeof(void *) in order to pass a pointer as the argument to setjmp/longjmp, but that could be avoided by using handles (indexes into a global array of all task structures) instead of raw pointers here, or by using a static pointer.
Of course, the tricky part is setting up jmpbuf objects for newly created tasks, each with their own stack. You can use a signal handler with sigaltstack for that:
static void (*tfn)(void *);
static void *tfn_arg;
static stack_t old_ss;
static int old_sm;
static struct sigaction old_sa;
Task *current_task = 0;
static Task *parent_task;
static int task_count;
static void newtask()
{
int sm;
void (*fn)(void *);
void *fn_arg;
task_count++;
sigaltstack(&old_ss, 0);
sigaction(SIGUSR1, &old_sa, 0);
sm = old_sm;
fn = tfn;
fn_arg = tfn_arg;
task_switch(parent_task);
sigsetmask(sm);
(*fn)(fn_arg);
abort();
}
Task *task_start(int ssize, void (*_tfn)(void *), void *_arg)
{
Task *volatile new;
stack_t t_ss;
struct sigaction t_sa;
old_sm = sigsetmask(~sigmask(SIGUSR1));
if (!current_task) task_init();
tfn = _tfn;
tfn_arg = _arg;
new = malloc(sizeof(Task) + ssize + ALIGN);
new->next = 0;
new->task_data = 0;
t_ss.ss_sp = (void *)(new + 1);
t_ss.ss_size = ssize;
t_ss.ss_flags = 0;
if ((unsigned long)t_ss.ss_sp & (ALIGN-1))
t_ss.ss_sp = (void *)(((unsigned long)t_ss.ss_sp+ALIGN) & ~(ALIGN-1));
t_sa.sa_handler = newtask;
t_sa.sa_mask = ~sigmask(SIGUSR1);
t_sa.sa_flags = SA_ONSTACK|SA_RESETHAND;
sigaltstack(&t_ss, &old_ss);
sigaction(SIGUSR1, &t_sa, &old_sa);
parent_task = current_task;
if (!setjmp(current_task->env)) {
current_task = new;
kill(getpid(), SIGUSR1); }
sigaltstack(&old_ss, 0);
sigaction(SIGUSR1, &old_sa, 0);
sigsetmask(old_sm);
return new;
}

If you wanted to keep it pure C, I think you might be able to use setjmp and longjmp, but I've never tried it myself, and I imagine there's probably some platforms on which this wouldn't work (i.e. certain registers/other settings not being saved). The only other alternative would be to write it in assembly.

As mentioned, setjmp/longjmp are standard C and are available even in the libc of 8-bit AVRs. They do exactly what you said you'd do in assembler: save the processor context. But one has to keep in mind that the intended purpose of those functions is just to jump backwards in the flow of control; switching between tasks is an abuse. It does work anyway, and looks like this is even frequently used in a variety of user-level thread libraries -- like GNU Pth. But still, is an abuse of the intended purpose, and requires being careful.
As Chris Dodd said, you still need to provide an stack for each new task. He used sigaltstack() and other signal-related functions, but those do not exist in standard C, only in unix-like environments. For example, the AVR libc does not provide them. So as an alternative you can try reserving a part of your existing stack (by declaring a big local array, or using alloca()) for use as the stack of the new thread. Just keep in mind that the main/scheduler thread will keep using its stack, each thread uses its own stack, and all of them will grow and shrink as stacks usually do, so they will need space for doing so without interfering with each other.
And since we're already mentioning unix-like, non-standard-C mechanisms, there is also makecontext()/swapcontext() and family, which are more powerful but harder to find than setjmp()/longjmp(). The names say it all really: the context functions let you manage full process contexts (stacks included), the jmp functions let you just jump around - you'll have to hack the rest.
For the AVR anyway, given that you won't probably have an OS to help nor much memory to blindly reserve, you'd be probably better off using assembler for the switching and stack initializing.

In my experience if people start writing schedulers it isn't too long before they start wanting things like network stacks, memory allocation and file systems too. It's almost never worth going down that route; you end up spending more time writing your own operating system than you're spending on your actual application.
First whiff of your project heading that way and it's almost always worth putting the effort to put in an existing OS (linux, VxWorks, etc). Of course, that might mean that you run into problems if the CPU isn't up to it. And AVR isn't exactly a whole lot of CPU, and fitting an existing OS on to it ranges from mostly impossible to tricky for the major OSes, though there are some tiny OSes (some open source, see http://en.wikipedia.org/wiki/List_of_real-time_operating_systems).
So at the commencement of a project you should carefully consider how you might wish to evolve it going into the future. This might influence your choice of CPU now to save having to do hideous things in software later.

Related

C: How to guard static variables in multithreaded environment?

Suppose having the following code elements working on a fifo buffer:
static uint_fast32_t buffer_start;
static uint_fast32_t buffer_end;
static mutex_t buffer_guard;
(...)
void buffer_write(uint8_t* data, uint_fast32_t len)
{
uint_fast32_t pos;
mutex_lock(buffer_guard);
pos = buffer_end;
buffer_end = buffer_end + len;
(...) /* Wrap around buffer_end, fill in data */
mutex_unlock(buffer_guard);
}
bool buffer_isempty(void)
{
bool ret;
mutex_lock(buffer_guard);
ret = (buffer_start == buffer_end);
mutex_unlock(buffer_guard);
return ret;
}
This code might be running on an embedded system, with a RTOS, with the buffer_write() and buffer_isempty() functions called from different threads. The compiler has no means to know that the mutex_lock() and mutex_unlock() functions provided by the RTOS are working with a critical sections.
As the code is above, due to buffer_end being a static variable (local to the compilation unit), the compiler might choose to reorder accesses to it around function calls (at least as far as I understand the C standard, this seems possible to happen). So potentially the code performing buffer_end = buffer_end + len line have a chance to end up before the call to mutex_lock().
Using volatile on these variables (like static volatile uint_fast32_t buffer_end;) seems to resolve this as then they would be constrained by sequence points (which a mutex_lock() call is, due to being a function call).
Is my understanding right on these?
Is there a more appropriate means (than using volatile) of dealing with this type of problem?

Const pointer to volatile struct member

I'm using microcontroller to make some ADC measurements. I have an issue when I try to compile following code using -O2 optimization, MCU freezes when PrintVal() function is present in code. I did some debugging and it turns out that when I add -fno-inline compiler flag, the code will run fine even with PrintVal() function.
Here is some background:
AdcIsr.c contains interrupt that is executed when ADC finishes it's job. This file also contains ISRInit() function that initializes variable that will hold value after conversion. In main loop will wait for interrupt and only then access AdcMeas.value.
AdcIsr.c
static volatile uin16_t* isrVarPtr = NULL;
ISR()
{
uint8_t tmp = readAdc();
*isrVarPtr = tmp;
}
void ISRInit(volatile uint16_t *var)
{
isrVarPtr = var;
}
AdcMeas.c
typedef struct{
uint8_t id;
volatile uint16_t value;
}AdcMeas_t;
static AdcMeas_t AdcMeas = {0};
const AdcMeas_t* AdcMeasGetStructPtr()
{
return &AdcMeas;
}
main.c
void PrintVal(const AdcMeas_t* data)
{
printf("AdcMeas %d value: %d\r\n", data->id, data->value);
}
void StartMeasurement()
{
...
AdcOn();
...
}
int main()
{
ISRInit(AdcMeasGetStructPtr()->value);
while(1)
{
StartMeasurement();
WaitForISR();
PrintVal(AdcMeasGetStructPtr());
DelayMs(1000);
}
}
Questions:
Is there something wrong with usage of const AdcMeas_t* data as argument of the PrintVal() function? I understand that AdcMeas.value may change inside interrupt and PrintVal() may be outdated.
AdcMeas contains a 'generic getter'. Is this a good practice to use this sort of function to allow read-only access to static structure? or should I implement AdcMeasGetId() and AdcMeasGetValue functions (note that this struct has only 2 members, what if it has 8 members)?
I know this code is a bit dumb (waiting for interrupt in while loop), this is just an example.
Some bugs:
You have no header files, neither library include or your own ones. This means that everything is hopelessly broken until you fix that. You cannot do multiple file projects in C without header files.
*isrVarPtr = tmp; Here you write to a variable without protection from race conditions. If the main program reads this variable in several steps, you risk getting incorrect data. You need to protect against race conditions or guarantee atomic access.
const AdcMeasGetStructPtr() is gibberish and there is no way that the return &AdcMeas; inside it would compile with a conforming C compiler.
If you have an old but conforming C90 compiler, the return type will get treated as int. Otherwise, if you have a modern C compiler, not even the function definition will compiler. So it would seem that something is very wrong with your compiler, which is a greater concern than this bug.
Declaring the typedef struct in the C file and then returning a pointer to it doesn't make any sense. You need to re-design this module. You could have a getter function returning an instance to a private struct, if there is only ever going to be 1 instance of it (singleton). However, as mentioned, it needs to handle race conditions.
Stylistic concerns:
Empty parenthesis () in a function declaration is almost always wrong in C. This is obsolete style and means "accept any parameter". C++ is different here.
int main() doesn't make any sense at all in a microcontroller system. You should use some implementation-defined form suitable for freestanding programs. The most commonly supported form is void main (void).
DelayMs(1000); is highly questionable code in any embedded system. There should never be a reason why you'd want to hang up your MCU being useless, with max current consumption, for a whole second.
Overall it seems you would benefit from a "continuous conversion" ADC. ADCs that support continuous conversion just dump their latest read in the data register and you can pick it up with polling whenever you need it. Catching all ADC interrupts is really just for hard realtime systems, signal processing and similar.

generic stack implementation in linux core

I am writing a patch to bcache Linux module and struggle to find generic LIFO implementation in Linux core. I have found several FIFO implementations both on defines and without. However, nothing alike for LIFO.
Where can one find one? Preferably C not asm and not based on defines, but any should work.
If no abstract LIFO is provided, what are the easiest structures to implement LIFO on (generic stack realization in Linux core for instance)?
If the LIFO has fixed maximum depth, and is not dynamically allocated, then it is simply something like
#define LIFO_MAXDEPTH 16
static spinlock_t lifo_lock = SPIN_LOCK_UNLOCKED;
static size_t lifo_count = 0;
static struct item lifo_entry[LIFO_MAXDEPTH];
int lifo_push(const struct item *from)
{
spin_lock(&lifo_lock);
if (lifo_count >= LIFO_MAXDEPTH) {
spin_unlock(&lifo_lock);
return -1;
}
lifo_entry[lifo_count++] = *from;
spin_unlock(&lifo_lock);
return 0;
}
int lifo_pop(struct item *to)
{
spin_lock(&lifo_lock);
if (lifo_count < 1) {
spin_unlock(&lifo_lock);
return -1;
}
*to = lifo_entry[--lifo_count];
spin_unlock(&lifo_lock);
return 0;
}
Because we only need to keep it locked for very short durations, a spinlock should suffice.
If the LIFO is dynamically allocated, things get more complicated. In particular, because we might have to call kmalloc() or kfree(), we cannot use a spinlock. You'd also want to split the stack into page-sized chunks, since higher-order allocations may fail. Then you must consider things like nefarious users trying to use the facility for DOS attacks, and so on.

Locking in OMP regions

I'm trying to instrument some functions in my application to see how long they take. I'm recording all of the times in-memory using a linked list.
In the process, I've introduced a global variable that keeps track of the end of the list. When I enter a new timing region, I insert a new record at the end of the list. Fairly simple stuff.
Some of the functions I want to track are called in OpenMP regions, however. Which means they'll likely be called multiple times in parallel. And this is where I'm stumped.
If this was using normal Pthreads, I'd simply wrap access to the global variable in a mutex and call it a day. However, I'm unsure: will this strategy still work with functions called in an OpenMP region? As in, will they respect the lock?
For example (won't compile, but I think gets the point across):
Record *head;
Record *tail;
void start_timing(char *name) {
Record *r = create_record(name);
tail->next_record = r;
tail = r;
return r;
}
int foo(void) {
Record r = start_timing("foo");
//Do something...
stop_timing(r);
}
int main(void) {
Record r = start_timing("main");
//Do something...
#pragma omp parallel for...
for (int i = 0; i < 42; i++) {
foo();
}
//Do some more...
stop_timing(r);
}
Which I would then update to:
void start_timing(char *name) {
Record *r = create_record(name);
acquire_mutex_on_tail();
tail->next_record = r;
tail = r;
release_mutex_on_tail();
return r;
}
(Apologies if this has an obvious answer - I'm relatively inexperience with the OpenMP framework and multithreading in general.)
The idiomatic mutex solution is to use OpenMP locks:
omp_set_lock(&taillock)
tail->next_record = r;
tail = r;
omp_unset_lock(&taillock)
and somewhere:
omp_lock_t taillock;
omp_init_lock(&taillock);
...
omp_destroy_lock(&taillock);
The simple OpenMP solution:
void start_timing(char *name) {
Record *r = create_record(name);
#pragma omp critical
{
tail->next_record = r;
tail = r;
}
return r;
}
That creates an implicit global lock bound to the source code line. For some detailed discussions see the answers to this question.
For practical purposes, using Pthread locks will also work, at least for scenarios where OpenMP is based on Pthreads.
A word of warning
Using locks in performance measurement code is dangerous. So is memory allocation, which also often implies using locks. This means, that start_time has significant cost and the performance will even get worse with more threads. That doesn't even consider the cache invalidation from having one thread allocating a chunk of memory (record) and then another thread modifying it (tail pointer).
Now that may be fine if the sections you measure take seconds, but it will cause great overhead and perturbation when your sections are only hundreds of cycles.
To create a scalable performance tracing facility, you must pre-allocate thread-local memory in larger chunks and have each thread write only to it's local part.
You can also chose to use some of the existing measurement infrastructures, such as Score-P.
Overhead & perturbation
First, distinguish between the two (linked concepts). Overhead is extra time you spend, while perturbation refers to the impact on what you measure (i.e. you now measure something different than what happens without the measurement). Overhead is undesirable in large quantities, but perturbation is much worse.
Yes, you can avoid some of the perturbation by pausing the timer during your expensive measurement runtime (the overhead remains). However, in a multi-threaded context this is still very problematic.
Slowing down progress in one thread, may lead to other threads waiting for it e.g. during an implicit barrier. How do you attribute the waiting time of that thread and others that follow transitively?
Memory allocation is usually locked - so if you allocate memory during measurement runtime, you will slow down other threads that depend on memory allocation. You could try to mitigate with memory pools, but I'd avoid the linked list in the first place.

How to write self modifying code in C?

I want to write a piece of code that changes itself continuously, even if the change is insignificant.
For example maybe something like
for i in 1 to 100, do
begin
x := 200
for j in 200 downto 1, do
begin
do something
end
end
Suppose I want that my code should after first iteration change the line x := 200 to some other line x := 199 and then after next iteration change it to x := 198 and so on.
Is writing such a code possible ? Would I need to use inline assembly for that ?
EDIT :
Here is why I want to do it in C:
This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.
You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.
If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.
A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.
Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/
In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.
Below is the first code :
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
void foo(void);
int change_page_permissions_of_address(void *addr);
int main(void) {
void *foo_addr = (void*)foo;
// Change the permissions of the page that contains foo() to read, write, and execute
// This assumes that foo() is fully contained by a single page
if(change_page_permissions_of_address(foo_addr) == -1) {
fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
return 1;
}
// Call the unmodified foo()
puts("Calling foo...");
foo();
// Change the immediate value in the addl instruction in foo() to 42
unsigned char *instruction = (unsigned char*)foo_addr + 18;
*instruction = 0x2A;
// Call the modified foo()
puts("Calling foo...");
foo();
return 0;
}
void foo(void) {
int i=0;
i++;
printf("i: %d\n", i);
}
int change_page_permissions_of_address(void *addr) {
// Move the pointer to the page boundary
int page_size = getpagesize();
addr -= (unsigned long)addr % page_size;
if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
return -1;
}
return 0;
}
It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.
This would be a good start. Essentially Lisp functionality in C:
http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/
Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable x in different ways as the loop index i changes. We could do something like this:
#include <stdio.h>
void multiply_x (int * x, int multiplier)
{
*x *= multiplier;
}
void add_to_x (int * x, int increment)
{
*x += increment;
}
int main (void)
{
int x = 0;
int i;
void (*fp)(int *, int);
for (i = 1; i < 6; ++i) {
fp = (i % 2) ? add_to_x : multiply_x;
fp(&x, i);
printf("%d\n", x);
}
return 0;
}
The output, when we compile and run the program, is:
1
2
5
20
25
Obviously, this will only work if you have finite number of things you want to do with x on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.
A self-interpreting language (not hard-compiled and linked like C) might be better for that. Perl, javascript, PHP have the evil eval() function that might be suited to your purpose. By it, you could have a string of code that you constantly modify and then execute via eval().
The suggestion about implementing LISP in C and then using that is solid, due to portability concerns. But if you really wanted to, this could also be implemented in the other direction on many systems, by loading your program's bytecode into memory and then returning to it.
There's a couple of ways you could attempt to do that. One way is via a buffer overflow exploit. Another would be to use mprotect() to make the code section writable, and then modify compiler-created functions.
Techniques like this are fun for programming challenges and obfuscated competitions, but given how unreadable your code would be combined with the fact you're exploiting what C considers undefined behavior, they're best avoided in production environments.
In standard C11 (read n1570), you cannot write self modifying code (at least without undefined behavior). Conceptually at least, the code segment is read-only.
You might consider extending the code of your program with plugins using your dynamic linker. This require operating system specific functions. On POSIX, use dlopen (and probably dlsym to get newly loaded function pointers). You could then overwrite function pointers with the address of new ones.
Perhaps you could use some JIT-compiling library (like libgccjit or asmjit) to achieve your goals. You'll get fresh function addresses and put them in your function pointers.
Remember that a C compiler can generate code of various size for a given function call or jump, so even overwriting that in a machine specific way is brittle.
My friend and I encountered this problem while working on a game that self-modifies its code. We allow the user to rewrite code snippets in x86 assembly.
This just requires leveraging two libraries -- an assembler, and a disassembler:
FASM assembler: https://github.com/ZenLulz/Fasm.NET
Udis86 disassembler: https://github.com/vmt/udis86
We read instructions using the disassembler, let the user edit them, convert the new instructions to bytes with the assembler, and write them back to memory. The write-back requires using VirtualProtect on windows to change page permissions to allow editing the code. On Unix you have to use mprotect instead.
I posted an article on how we did it, as well as the sample code.
These examples are on Windows using C++, but it should be very easy to make cross-platform and C only.
This is how to do it on windows with c++. You'll have to VirtualAlloc a byte array with read/write protections, copy your code there, and VirtualProtect it with read/execute protections. Here's how you dynamically create a function that does nothing and returns.
#include <cstdio>
#include <Memoryapi.h>
#include <windows.h>
using namespace std;
typedef unsigned char byte;
int main(int argc, char** argv){
byte bytes [] = { 0x48, 0x31, 0xC0, 0x48, 0x83, 0xC0, 0x0F, 0xC3 }; //put code here
//xor %rax, %rax
//add %rax, 15
//ret
int size = sizeof(bytes);
DWORD protect = PAGE_READWRITE;
void* meth = VirtualAlloc(NULL, size, MEM_COMMIT, protect);
byte* write = (byte*) meth;
for(int i = 0; i < size; i++){
write[i] = bytes[i];
}
if(VirtualProtect(meth, size, PAGE_EXECUTE_READ, &protect)){
typedef int (*fptr)();
fptr my_fptr = reinterpret_cast<fptr>(reinterpret_cast<long>(meth));
int number = my_fptr();
for(int i = 0; i < number; i++){
printf("I will say this 15 times!\n");
}
return 0;
} else{
printf("Unable to VirtualProtect code with execute protection!\n");
return 1;
}
}
You assemble the code using this tool.
While "true" self modifying code in C is impossible (the assembly way feels like slight cheat, because at this point, we're writing self modifying code in assembly and not in C, which was the original question), there might be a pure C way to make the similar effect of statements paradoxically not doing what you think are supposed do to. I say paradoxically, because both the ASM self modifying code and the following C snippet might not superficially/intuitively make sense, but are logical if you put intuition aside and do a logical analysis, which is the discrepancy which makes paradox a paradox.
#include <stdio.h>
#include <string.h>
int main()
{
struct Foo
{
char a;
char b[4];
} foo;
foo.a = 42;
strncpy(foo.b, "foo", 3);
printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);
*(int*)&foo.a = 1918984746;
printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);
return 0;
}
$ gcc -o foo foo.c && ./foo
foo.a=42, foo.b="foo"
foo.a=42, foo.b="bar"
First, we change the value of foo.a and foo.b and print the struct. Then we change only the value of foo.a, but observe the output.

Resources