What's a good way to implement simple clone() based multithread library? - c

I'm trying to build simple multithread library based on linux using clone() and other kernel utilities.I've come to a point where I'm not really sure what's the correct way to do things. I tried going trough original NPTL code but it's a bit too much.
That's how for instance I imagine the create method:
typedef int sk_thr_id;
typedef void *sk_thr_arg;
typedef int (*sk_thr_func)(sk_thr_arg);
sk_thr_id sk_thr_create(sk_thr_func f, sk_thr_arg a){
void* stack;
stack = malloc( 1024*64 );
if ( stack == 0 ){
perror( "malloc: could not allocate stack" );
exit( 1 );
}
return ( clone(f, (char*) stack + FIBER_STACK, SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, a ) );
}
1: I'm not really sure what the correct clone() flags should be. I just found these being used in a simple example. Any general directions here will be welcome.
Here are parts of the mutex primitives created using futexes(not my own code for now):
#define cmpxchg(P, O, N) __sync_val_compare_and_swap((P), (O), (N))
#define cpu_relax() asm volatile("pause\n": : :"memory")
#define barrier() asm volatile("": : :"memory")
static inline unsigned xchg_32(void *ptr, unsigned x)
{
__asm__ __volatile__("xchgl %0,%1"
:"=r" ((unsigned) x)
:"m" (*(volatile unsigned *)ptr), "0" (x)
:"memory");
return x;
}
static inline unsigned short xchg_8(void *ptr, char x)
{
__asm__ __volatile__("xchgb %0,%1"
:"=r" ((char) x)
:"m" (*(volatile char *)ptr), "0" (x)
:"memory");
return x;
}
int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}
typedef union mutex mutex;
union mutex
{
unsigned u;
struct
{
unsigned char locked;
unsigned char contended;
} b;
};
int mutex_init(mutex *m, const pthread_mutexattr_t *a)
{
(void) a;
m->u = 0;
return 0;
}
int mutex_lock(mutex *m)
{
int i;
/* Try to grab lock */
for (i = 0; i < 100; i++)
{
if (!xchg_8(&m->b.locked, 1)) return 0;
cpu_relax();
}
/* Have to sleep */
while (xchg_32(&m->u, 257) & 1)
{
sys_futex(m, FUTEX_WAIT_PRIVATE, 257, NULL, NULL, 0);
}
return 0;
}
int mutex_unlock(mutex *m)
{
int i;
/* Locked and not contended */
if ((m->u == 1) && (cmpxchg(&m->u, 1, 0) == 1)) return 0;
/* Unlock */
m->b.locked = 0;
barrier();
/* Spin and hope someone takes the lock */
for (i = 0; i < 200; i++)
{
if (m->b.locked) return 0;
cpu_relax();
}
/* We need to wake someone up */
m->b.contended = 0;
sys_futex(m, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
return 0;
}
2: The main question for me is how to implement the "join" primitive? I know it's supposed to be based on futexes too. It's a struggle for me for now to come up with something.
3: I need some way to cleanup stuff(like the allocated stack) after a thread has finished. I can't really thing of a good way to do this too.
Probably for these I'll need to have additional structure in user space for every thread with some information saved in it. Can someone point me in good direction for solving these issues?
4: I'll want to have a way to tell how much time a thread has been running, how long it's been since it's last being scheduled and other stuff like that. Are there some kernel calls providing such info?
Thanks in advance!

The idea that there can exist a "multithreading library" as a third-party library separate from the rest of the standard library is an outdated and flawed notion. If you want to do this, you'll have to first drop all use of the standard library; particularly, your call to malloc is completely unsafe if you're calling clone yourself, because:
malloc will have no idea that multiple threads exist, and therefore may fail to perform proper synchronization.
Even if it knew they existed, malloc will need to access an unspecified, implementation-specific structure located at the address given by the thread pointer. As this structure is implementation-specific, you have no way of creating such a structure that will be interpreted correctly by both the current and all future versions of your system's libc.
These issues don't apply just to malloc but to most of the standard library; even async-signal-safe functions may be unsafe to use, as they might dereference the thread pointer for cancellation-related purposes, performing optimal syscall mechanisms, etc.
If you really insist on making your own threads implementation, you'll have to abstain from using glibc or any modern libc that's integrated with threads, and instead opt for something much more naive like klibc. This could be an educational experiment, but it would not be appropriate for a deployed application.

1) You are using an example of LinuxThreads. I will not rewrite good references for directions, but I advise you "The Linux Programming interface" of Michael Kerrisk, chapter 28. It explains in 25 pages, what you need.
2) If you set the CLONE_CHILD_CLEARID flag, when the child terminates, the ctid argument of clone is cleared. If you treat that pointer as a futex, you can implement the join primitive. Good luck :-) If you don't want to use futexes, have also a look to wait3 and wait4.
3) I do not know what you want to cleanup, but you can use the clone tls arugment. This is a thread local storage buffer. If the thread is finished, you can clean that buffer.
4) See getrusage.

Related

Memory ordering for a spin-lock "call once" implementation

Suppose I wanted to implement a mechanism for calling a piece of code exactly once (e.g. for initialization purposes), even when multiple threads hit the call site repeatedly. Basically, I'm trying to implement something like pthread_once, but with GCC atomics and spin-locking. I have a candidate implementation below, but I'd like to know if
a) it could be faster in the common case (i.e. already initialized), and,
b) is the selected memory ordering strong enough / too strong?
Architectures of interest are x86_64 (primarily) and aarch64.
The intended use API is something like this
void gets_called_many_times_from_many_threads(void)
{
static int my_once_flag = 0;
if (once_enter(&my_once_flag)) {
// do one-time initialization here
once_commit(&my_once_flag);
}
// do other things that assume the initialization has taken place
}
And here is the implementation:
int once_enter(int *b)
{
int zero = 0;
int got_lock = __atomic_compare_exchange_n(b, &zero, 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
if (got_lock) return 1;
while (2 != __atomic_load_n(b, __ATOMIC_ACQUIRE)) {
// on x86, insert a pause instruction here
};
return 0;
}
void once_commit(int *b)
{
(void) __atomic_store_n(b, 2, __ATOMIC_RELEASE);
}
I think that the RELAXED ordering on the compare exchange is okay, because we don't skip the atomic load in the while condition even if the compare-exchange gives us 2 (in the "zero" variable), so the ACQUIRE on that load synchronizes with the RELEASE in once_commit (I think), but maybe on a successful compare-exchange we need to use RELEASE? I'm unclear here.
Also, I just learned that lock cmpxchg is a full memory barrier on x86, and since we are hitting the __atomic_compare_exchange_n in the common case (initialization has already been done), that barrier it is occurring on every function call. Is there an easy way to avoid this?
UPDATE
Based on the comments and accepted answer, I've come up with the following modified implementation. If anybody spots a bug please let me know, but I believe it's correct. Basically, the change amounts to implementing double-check locking. I also switched to using SEQ_CST because:
I mainly care that the common (already initialized) case is fast.
I observed that GCC doesn't emit a memory fence instruction on x86 for the first read (and it does do so on ARM even with ACQUIRE).
#ifdef __x86_64__
#define PAUSE() __asm __volatile("pause")
#else
#define PAUSE()
#endif
int once_enter(int *b)
{
if(2 == __atomic_load_n(b, __ATOMIC_SEQ_CST)) return 0;
int zero = 0;
int got_lock = __atomic_compare_exchange_n(b, &zero, 1, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
if (got_lock) return 1;
while (2 != __atomic_load_n(b, __ATOMIC_SEQ_CST)) {
PAUSE();
};
return 0;
}
void once_commit(int *b)
{
(void) __atomic_store_n(b, 2, __ATOMIC_SEQ_CST);
}
a, What you need is a double-checked lock.
Basically, instead of entering the lock every time, you do an acquiring-load to see if the initialisation has been done yet, and only invoke once_enter if it has not.
void gets_called_many_times_from_many_threads(void)
{
static int my_once_flag = 0;
if (__atomic_load_n(&my_once_flag, __ATOMIC_ACQUIRE) != 2) {
if (once_enter(&my_once_flag)) {
// do one-time initialization here
once_commit(&my_once_flag);
}
}
// do other things that assume the initialization has taken place
}
b, I believe this is enough, your initialisation happens before the releasing store of 2 to my_once_flag, and every other thread has to observe the value of 2 with an acquiring load from the same variable.

Linux kernel driver : polling on hardware register

I'm writing a Linux driver that aims to delegate some work to hardware accelerators. My driver writes some registers and memories to tell the hardware to do its job. Once the hardware is done, it will indicate so by setting a status register to 1. I do realize this is not how it should be done and that IRQ should be used instead and I'm planning to do so. But for now, I'm willing to do some polling on that register. And here I am, asking you smart people what's the best way to do so.
My hardware registers are mapped into kernel space using ioremap() and here is the code I'm struggling with :
int mailbox_request(const uint32_t *request, size_t request_size, struct mailbox_reply *reply)
{
size_t i;
int retval = 0;
if(down_interruptible(&sem))
return -ERESTARTSYS;
for(i = 0; i < request_size; i++)
iowrite32(mailbox + FIFO_OFFSET, request[i]);
iowrite32(mailbox + CONTROL_OFFSET, 1);
// WHAT SHOULD I DO HERE TO WAIT FOR THE RESULT ?
// while(ioread32(mailbox + STATUS_OFFSET) != 1); // better not do that...
reply->data = ioread32(mailbox + RESULT_OFFSET);
out:
up(&sem);
return retval;
}
My first idea was to use a workqueue and a waitqueue as follow :
static DECLARE_WORK(workqueue, mailbox_polling_result);
static DECLARE_WAIT_QUEUE_HEAD(wq);
static void mailbox_polling_result(struct work_struct *work)
{
while(ioread32(mailbox + STATUS_OFFSET) != 1)
msleep_interruptible(100);
wake_up_interruptible(&wq);
}
int mailbox_request(const uint32_t *request, size_t request_size, struct mailbox_reply *reply)
{
[...]
schedule_work(&workqueue);
wait_event_interruptible(wq, ioread32(mailbox + STATUS_OFFSET) == 1);
[...]
}
I'm not sure this is relevant, but just in case : the mailbox_request function won't necessarily be called from a process context. It might very well be called by the kernel itself. I read somewhere that workqueues are executed in a process context, that's why I'm telling you about that, to be sure this is not an issue.
Then I was wondering if the {work,wait}queue way of doing was not a bit overkill for what I'm trying to achieve ? I read somewhere about the schedule() function and I have to admit this a bit obscure to me. Would you consider that to be ok :
int mailbox_request(const uint32_t *request, size_t request_size, struct mailbox_reply *reply)
{
[...]
while(ioread32(mailbox + STATUS_OFFSET) != 1) {
schedule(); // or maybe some interruptible_sleep ?
}
[...]
}
The idea being to prevent the kernel from being stuck if it has better things to do. If you think of better ways to do that, I'd be glad to read about it !

Is this a decent home-made mutex implementation? Criticism? Potential Problems?

I'm wondering if anyone sees anything that would likely cause problems in this code. I know there are other ways/API calls I could used to have done this, but I'm trying to lay the foundation for my own platform independant? / cross-platform mutex framework.
Obviously I need to do some #ifdef's and define some macros for the Win32 Sleep() and GetCurrentThreadID() calls...
typedef struct aec {
unsigned long long lastaudibleframe; /* time stamp of last audible frame */
unsigned short aiws; /* Average mike input when speaker is playing */
unsigned short aiwos; /*Average mike input when speaker ISNT playing */
unsigned long long t_aiws, t_aiwos; /* Internal running total */
unsigned int c_aiws, c_aiwos; /* Internal counters */
unsigned long lockthreadid;
int stlc; /* Same thread lock count */
} AEC;
char lockecho( AEC *ec ) {
unsigned long tid=0;
static int inproc=0;
while (inproc) {
Sleep(1);
}
inproc=1;
if (!ec) {
inproc=0;
return 0;
}
tid=GetCurrentThreadId();
if (ec->lockthreadid==tid) {
inproc=0;
ec->stlc++;
return 1;
}
while (ec->lockthreadid!=0) {
Sleep(1);
}
ec->lockthreadid=tid;
inproc=0;
return 1;
}
char unlockecho( AEC *ec ) {
unsigned long tid=0;
if (!ec)
return 1;
tid=GetCurrentThreadId();
if (tid!=ec->lockthreadid)
return 0;
if (tid==ec->lockthreadid) {
if (ec->stlc>0) {
ec->stlc--;
} else {
ec->lockthreadid=0;
}
}
return 1;
}
No it's not, AFAIK you can't implement a mutex with plain C code without some low-level atomic operations (RMW, Test and Set... etc).. In your particular example, consider what happens if a context switch interrupts the first thread before it gets a chance to set inproc, then the second thread will resume and set it to 1 and now both threads "think" they have exclusive access to the struct.. this is just one of many things that could go wrong with your approach.
Also note that even if a thread gets a chance to set inproc, assignment is not guranteed to be atomic (it could be interrupted in the middle of assigning the variable).
As mux points out, your proposed code is incorrect due to many race conditions. You could solve this using atomic instructions like "Compare and Set", but you'll need to define those separately for each platform anyway. You're better off just defining a high-level "Lock" and "Unlock" interface, and implementing those using whatever the platform provides.

Run-time mocking in C?

This has been pending for a long time in my list now. In brief - I need to run mocked_dummy() in the place of dummy() ON RUN-TIME, without modifying factorial(). I do not care on the entry point of the software. I can add up any number of additional functions (but cannot modify code within /*---- do not modify ----*/).
Why do I need this?
To do unit tests of some legacy C modules. I know there are a lot of tools available around, but if run-time mocking is possible I can change my UT approach (add reusable components) make my life easier :).
Platform / Environment?
Linux, ARM, gcc.
Approach that I'm trying with?
I know GDB uses trap/illegal instructions for adding up breakpoints (gdb internals).
Make the code self modifiable.
Replace dummy() code segment with illegal instruction, and return as immediate next instruction.
Control transfers to trap handler.
Trap handler is a reusable function that reads from a unix domain socket.
Address of mocked_dummy() function is passed (read from map file).
Mock function executes.
There are problems going ahead from here. I also found the approach is tedious and requires good amount of coding, some in assembly too.
I also found, under gcc each function call can be hooked / instrumented, but again not very useful since the the function is intended to be mocked will anyway get executed.
Is there any other approach that I could use?
#include <stdio.h>
#include <stdlib.h>
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
int (*fp)(int) = atoi(argv[1]);
printf("fp = %x\n",fp);
printf("factorial of 5 is = %d\n",fp(5));
printf("factorial of 5 is = %d\n",factorial(5));
return 1;
}
test-dept is a relatively recent C unit testing framework that allows you to do runtime stubbing of functions. I found it very easy to use - here's an example from their docs:
void test_stringify_cannot_malloc_returns_sane_result() {
replace_function(&malloc, &always_failing_malloc);
char *h = stringify('h');
assert_string_equals("cannot_stringify", h);
}
Although the downloads section is a little out of date, it seems fairly actively developed - the author fixed an issue I had very promptly. You can get the latest version (which I've been using without issues) with:
svn checkout http://test-dept.googlecode.com/svn/trunk/ test-dept-read-only
the version there was last updated in Oct 2011.
However, since the stubbing is achieved using assembler, it may need some effort to get it to support ARM.
This is a question I've been trying to answer myself. I also have the requirement that I want the mocking method/tools to be done in the same language as my application. Unfortunately this cannot be done in C in a portable way, so I've resorted to what you might call a trampoline or detour. This falls under the "Make the code self modifiable." approach you mentioned above. This is were we change the actually bytes of a function at runtime to jump to our mock function.
#include <stdio.h>
#include <stdlib.h>
// Additional headers
#include <stdint.h> // for uint32_t
#include <sys/mman.h> // for mprotect
#include <errno.h> // for errno
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
typedef void (*dummy_fun)(void);
void set_run_mock()
{
dummy_fun run_ptr, mock_ptr;
uint32_t off;
unsigned char * ptr, * pg;
run_ptr = dummy;
mock_ptr = mocked_dummy;
if (run_ptr > mock_ptr) {
off = run_ptr - mock_ptr;
off = -off - 5;
}
else {
off = mock_ptr - run_ptr - 5;
}
ptr = (unsigned char *)run_ptr;
pg = (unsigned char *)(ptr - ((size_t)ptr % 4096));
if (mprotect(pg, 5, PROT_READ | PROT_WRITE | PROT_EXEC)) {
perror("Couldn't mprotect");
exit(errno);
}
ptr[0] = 0xE9; //x86 JMP rel32
ptr[1] = off & 0x000000FF;
ptr[2] = (off & 0x0000FF00) >> 8;
ptr[3] = (off & 0x00FF0000) >> 16;
ptr[4] = (off & 0xFF000000) >> 24;
}
int main(int argc, char * argv[])
{
// Run for realz
factorial(5);
// Set jmp
set_run_mock();
// Run the mock dummy
factorial(5);
return 0;
}
Portability explanation...
mprotect() - This changes the memory page access permissions so that we can actually write to memory that holds the function code. This isn't very portable, and in a WINAPI env, you may need to use VirtualProtect() instead.
The memory parameter for mprotect is aligned to the previous 4k page, this also can change from system to system, 4k is appropriate for vanilla linux kernel.
The method that we use to jmp to the mock function is to actually put down our own opcodes, this is probably the biggest issue with portability because the opcode I've used will only work on a little endian x86 (most desktops). So this would need to be updated for each arch you plan to run on (which could be semi-easy to deal with in CPP macros.)
The function itself has to be at least five bytes. The is usually the case because every function normally has at least 5 bytes in its prologue and epilogue.
Potential Improvements...
The set_mock_run() call could easily be setup to accept parameters for reuse. Also, you could save the five overwritten bytes from the original function to restore later in the code if you desire.
I'm unable to test, but I've read that in ARM... you'd do similar but you can jump to an address (not an offset) with the branch opcode... which for an unconditional branch you'd have the first bytes be 0xEA and the next 3 bytes are the address.
Chenz
An approach that I have used in the past that has worked well is the following.
For each C module, publish an 'interface' that other modules can use. These interfaces are structs that contain function pointers.
struct Module1
{
int (*getTemperature)(void);
int (*setKp)(int Kp);
}
During initialization, each module initializes these function pointers with its implementation functions.
When you write the module tests, you can dynamically changes these function pointers to its mock implementations and after testing, restore the original implementation.
Example:
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummyFn(void)
{
printf("__%s__()\n",__func__);
}
static void (*dummy)(void) = dummyFn;
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
void (*oldDummy) = dummy;
/* with the original dummy function */
printf("factorial of 5 is = %d\n",factorial(5));
/* with the mocked dummy */
oldDummy = dummy; /* save the old dummy */
dummy = mocked_dummy; /* put in the mocked dummy */
printf("factorial of 5 is = %d\n",factorial(5));
dummy = oldDummy; /* restore the old dummy */
return 1;
}
You can replace every function by the use of LD_PRELOAD. You have to create a shared library, which gets loaded by LD_PRELOAD. This is a standard function used to turn programs without support for SOCKS into SOCKS aware programs. Here is a tutorial which explains it.

C branch on static variable optimization

Let me preface this by saying I haven't profiled this code, nor is it a critical path. This is mostly for my own curiosity.
I have a function that declares/defines a static int to a known error value that will cause the code to take a branch. However, if the function succeeds, I know with certainty that the branch will never be taken again. Is there a compile time optimization for this? Specifically GNU/gcc/glibc?
So I have this:
static unsigned long volatile *getReg(unsigned long addr){
static int fd = -1;
if (fd < 0){
if (fd = open("file", O_RDWR | O_SYNC) < 0){
return NULL;
}
}
}
So once the function completes successfully (if this function returns null, I exit the program), I know that fd will for all future calls be valid and will never take the first branch. I know there's the __builtin_expect() macro, so I could write
if (__builtin_expect((fd<0),0){
But from what I understand that's only a HINT to the compiler, and it still has to perform the condition check. And I also realize it will in 99.9999% of the cases be more than enough so that any further performance increase is negligible.
I was wondering if there was a way of preventing even the first condition check (the fd <0 ) after the very first time it gets run.
The short answer is "no".
I mean, sure, you could maybe play tricks with pointers to functions, monkey-patching your code, etc., but that would almost certainly be slower than just doing the test.
Branches are only expensive when they are mis-predicted. __builtin_expect will arrange to ensure that this branch is only mis-predicted the first time.
You are talking about literally one or two cycles here, and possibly not even that, depending on what else the CPU is doing near this code.
[update]
If something like this really is being called millions or billions of times per second, you would deal with it by restructuring your code to initialize fd early and then use it repeatedly without bothering to test. For example, you might add an initGlobalState(); call near the top of main() and open the file then. (You would want a corresponding destroyGlobalState(); to close it again.)
And of course, a file descriptor is a horrible example, because anything you are doing to it will take vastly more than one or two cycles anyway.
In C++, constructors, destructors, and the RAII idiom makes this sort of approach very natural, by the way.
Split the function in two, in their own source file ... and let the caller worry about it :)
static int fd;
unsigned long volatile *getReg(unsigned long addr) {
/* do stuff with fd and addr */
return 0;
}
int getRegSetup(void) {
fd = open("file", O_RDWR | O_SYNC);
if (fd < 0) return 1; /* error */
/* continue processing */
return 0; /* ok */
}
The caller then does
/* ... */
if (getRegSetup()) {
/* error */
} else {
do {
ptr = getReg(42);
} while (ptr);
}
/* ... */
Well one of the ways to fix this would be to use a function pointer to call the method. Initialize the function ptr to your long function and at the end of the first call set it to the version without additional initialization.
That said, it sounds like an absolute maintenance nightmare and is surely not worth to avoid one branch - but you get rid of the branch.. (and certainly get rid of any chance that the function is inlined which depending on how long the function is will be almost certainly detrimental)
__builtin_expect is only a hint. It helps compiler to generate better code. For example, re-arrange jump labels so that mainline code is continually aligned in memory, which makes it more friendly for code cache lines, easier to fetch from main memory etc. Running profile guided optimization is even better.
I don't see any locking in your code, so I assume this function is not supposed to be called from multiple threads at the same time. In this case you have to move fd out of the function scope, so that double checked locking is not applied. Then, re-arrange the code a bit (that's what GCC supposed to do with branch hints, but you know...). Plus, you can copy a file descriptor from main memory / cache line into a register if you access it often. The code will look something like this:
static int g_fd = -1;
static unsigned long volatile *getReg(unsigned long addr)
{
register int fd = g_fd;
if (__builtin_expect ((fd > 0), 1))
{
on_success:
return NULL; // Do important stuff here.
}
fd = open("file", O_RDWR | O_SYNC);
if (__builtin_expect ((fd > 0), 1))
{
g_fd = fd;
goto on_success;
}
return NULL;
}
But please don't take this seriously. System calls and file I/O are so bad so optimizing stuff like this doesn't make any sense (with some exceptions).
And if you really want to call it once, then you better off moving file open into a separate function that is called once, and before everything else. And yes, take a look at GCC`s profile feedback and LTO. That will help you achieve good results without spending too much time on stuff like this.
For anyone curious, this is what I came up with. Note that this is a module to a larger, long running program. Also, that it hasn't been reviewed, and is basically a bad hack anyway.
__attribute__((noinline)) static unsigned int volatile *get_mem(unsigned int addr) {
static void *map = 0 ;
static unsigned prevPage = -1U ;
static int fd = -1;
int poss_err = 0;
register unsigned page = addr & ~MAP_MASK ;
if ( unlikely(fd < 0) ) {
if ((fd = open("/dev/mem", O_RDWR | O_SYNC)) < 0) {
longjmp(mem_err, errno);
}
}
if ( page != prevPage ) {
if ( map ) {
if (unlikely((munmap(map,MAP_SIZE) < 0))) poss_err = 1;
}
if (unlikely((map = mmap(0, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, page )) == MAP_FAILED)) longjmp(mem_err, errno);
prevPage = page ;
}
return (unsigned int volatile *)((char *)map+(addr & MAP_MASK));
}
static void set_reg(const struct reg_info * const r, unsigned int val)
{
unsigned int volatile * const mem = get_mem(r->addr);
*mem = (*mem & (~(r->mask << r->shift))) | (val << r->shift);
}
// This isn't in the final piece. There are several entry points into this module. Just an example
static int entryPoint(unsigned int value){
if (setjmp(mem_err)!=0) {
// Serious error
return -1;
}
for (i=0; i<n; i++) {
if (strlen(regs[i].name) == strlen(name) &&
strncmp(regs[i].name, name, strlen (name))==0) {
set_reg(&regs[i], value);
return value;
}
}
}
This obviously isn't an answer to the question, since it checks the condition on every call.

Resources