Linux kernel driver : polling on hardware register - c

I'm writing a Linux driver that aims to delegate some work to hardware accelerators. My driver writes some registers and memories to tell the hardware to do its job. Once the hardware is done, it will indicate so by setting a status register to 1. I do realize this is not how it should be done and that IRQ should be used instead and I'm planning to do so. But for now, I'm willing to do some polling on that register. And here I am, asking you smart people what's the best way to do so.
My hardware registers are mapped into kernel space using ioremap() and here is the code I'm struggling with :
int mailbox_request(const uint32_t *request, size_t request_size, struct mailbox_reply *reply)
{
size_t i;
int retval = 0;
if(down_interruptible(&sem))
return -ERESTARTSYS;
for(i = 0; i < request_size; i++)
iowrite32(mailbox + FIFO_OFFSET, request[i]);
iowrite32(mailbox + CONTROL_OFFSET, 1);
// WHAT SHOULD I DO HERE TO WAIT FOR THE RESULT ?
// while(ioread32(mailbox + STATUS_OFFSET) != 1); // better not do that...
reply->data = ioread32(mailbox + RESULT_OFFSET);
out:
up(&sem);
return retval;
}
My first idea was to use a workqueue and a waitqueue as follow :
static DECLARE_WORK(workqueue, mailbox_polling_result);
static DECLARE_WAIT_QUEUE_HEAD(wq);
static void mailbox_polling_result(struct work_struct *work)
{
while(ioread32(mailbox + STATUS_OFFSET) != 1)
msleep_interruptible(100);
wake_up_interruptible(&wq);
}
int mailbox_request(const uint32_t *request, size_t request_size, struct mailbox_reply *reply)
{
[...]
schedule_work(&workqueue);
wait_event_interruptible(wq, ioread32(mailbox + STATUS_OFFSET) == 1);
[...]
}
I'm not sure this is relevant, but just in case : the mailbox_request function won't necessarily be called from a process context. It might very well be called by the kernel itself. I read somewhere that workqueues are executed in a process context, that's why I'm telling you about that, to be sure this is not an issue.
Then I was wondering if the {work,wait}queue way of doing was not a bit overkill for what I'm trying to achieve ? I read somewhere about the schedule() function and I have to admit this a bit obscure to me. Would you consider that to be ok :
int mailbox_request(const uint32_t *request, size_t request_size, struct mailbox_reply *reply)
{
[...]
while(ioread32(mailbox + STATUS_OFFSET) != 1) {
schedule(); // or maybe some interruptible_sleep ?
}
[...]
}
The idea being to prevent the kernel from being stuck if it has better things to do. If you think of better ways to do that, I'd be glad to read about it !

Related

Memory ordering for a spin-lock "call once" implementation

Suppose I wanted to implement a mechanism for calling a piece of code exactly once (e.g. for initialization purposes), even when multiple threads hit the call site repeatedly. Basically, I'm trying to implement something like pthread_once, but with GCC atomics and spin-locking. I have a candidate implementation below, but I'd like to know if
a) it could be faster in the common case (i.e. already initialized), and,
b) is the selected memory ordering strong enough / too strong?
Architectures of interest are x86_64 (primarily) and aarch64.
The intended use API is something like this
void gets_called_many_times_from_many_threads(void)
{
static int my_once_flag = 0;
if (once_enter(&my_once_flag)) {
// do one-time initialization here
once_commit(&my_once_flag);
}
// do other things that assume the initialization has taken place
}
And here is the implementation:
int once_enter(int *b)
{
int zero = 0;
int got_lock = __atomic_compare_exchange_n(b, &zero, 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
if (got_lock) return 1;
while (2 != __atomic_load_n(b, __ATOMIC_ACQUIRE)) {
// on x86, insert a pause instruction here
};
return 0;
}
void once_commit(int *b)
{
(void) __atomic_store_n(b, 2, __ATOMIC_RELEASE);
}
I think that the RELAXED ordering on the compare exchange is okay, because we don't skip the atomic load in the while condition even if the compare-exchange gives us 2 (in the "zero" variable), so the ACQUIRE on that load synchronizes with the RELEASE in once_commit (I think), but maybe on a successful compare-exchange we need to use RELEASE? I'm unclear here.
Also, I just learned that lock cmpxchg is a full memory barrier on x86, and since we are hitting the __atomic_compare_exchange_n in the common case (initialization has already been done), that barrier it is occurring on every function call. Is there an easy way to avoid this?
UPDATE
Based on the comments and accepted answer, I've come up with the following modified implementation. If anybody spots a bug please let me know, but I believe it's correct. Basically, the change amounts to implementing double-check locking. I also switched to using SEQ_CST because:
I mainly care that the common (already initialized) case is fast.
I observed that GCC doesn't emit a memory fence instruction on x86 for the first read (and it does do so on ARM even with ACQUIRE).
#ifdef __x86_64__
#define PAUSE() __asm __volatile("pause")
#else
#define PAUSE()
#endif
int once_enter(int *b)
{
if(2 == __atomic_load_n(b, __ATOMIC_SEQ_CST)) return 0;
int zero = 0;
int got_lock = __atomic_compare_exchange_n(b, &zero, 1, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
if (got_lock) return 1;
while (2 != __atomic_load_n(b, __ATOMIC_SEQ_CST)) {
PAUSE();
};
return 0;
}
void once_commit(int *b)
{
(void) __atomic_store_n(b, 2, __ATOMIC_SEQ_CST);
}
a, What you need is a double-checked lock.
Basically, instead of entering the lock every time, you do an acquiring-load to see if the initialisation has been done yet, and only invoke once_enter if it has not.
void gets_called_many_times_from_many_threads(void)
{
static int my_once_flag = 0;
if (__atomic_load_n(&my_once_flag, __ATOMIC_ACQUIRE) != 2) {
if (once_enter(&my_once_flag)) {
// do one-time initialization here
once_commit(&my_once_flag);
}
}
// do other things that assume the initialization has taken place
}
b, I believe this is enough, your initialisation happens before the releasing store of 2 to my_once_flag, and every other thread has to observe the value of 2 with an acquiring load from the same variable.

Is this a decent home-made mutex implementation? Criticism? Potential Problems?

I'm wondering if anyone sees anything that would likely cause problems in this code. I know there are other ways/API calls I could used to have done this, but I'm trying to lay the foundation for my own platform independant? / cross-platform mutex framework.
Obviously I need to do some #ifdef's and define some macros for the Win32 Sleep() and GetCurrentThreadID() calls...
typedef struct aec {
unsigned long long lastaudibleframe; /* time stamp of last audible frame */
unsigned short aiws; /* Average mike input when speaker is playing */
unsigned short aiwos; /*Average mike input when speaker ISNT playing */
unsigned long long t_aiws, t_aiwos; /* Internal running total */
unsigned int c_aiws, c_aiwos; /* Internal counters */
unsigned long lockthreadid;
int stlc; /* Same thread lock count */
} AEC;
char lockecho( AEC *ec ) {
unsigned long tid=0;
static int inproc=0;
while (inproc) {
Sleep(1);
}
inproc=1;
if (!ec) {
inproc=0;
return 0;
}
tid=GetCurrentThreadId();
if (ec->lockthreadid==tid) {
inproc=0;
ec->stlc++;
return 1;
}
while (ec->lockthreadid!=0) {
Sleep(1);
}
ec->lockthreadid=tid;
inproc=0;
return 1;
}
char unlockecho( AEC *ec ) {
unsigned long tid=0;
if (!ec)
return 1;
tid=GetCurrentThreadId();
if (tid!=ec->lockthreadid)
return 0;
if (tid==ec->lockthreadid) {
if (ec->stlc>0) {
ec->stlc--;
} else {
ec->lockthreadid=0;
}
}
return 1;
}
No it's not, AFAIK you can't implement a mutex with plain C code without some low-level atomic operations (RMW, Test and Set... etc).. In your particular example, consider what happens if a context switch interrupts the first thread before it gets a chance to set inproc, then the second thread will resume and set it to 1 and now both threads "think" they have exclusive access to the struct.. this is just one of many things that could go wrong with your approach.
Also note that even if a thread gets a chance to set inproc, assignment is not guranteed to be atomic (it could be interrupted in the middle of assigning the variable).
As mux points out, your proposed code is incorrect due to many race conditions. You could solve this using atomic instructions like "Compare and Set", but you'll need to define those separately for each platform anyway. You're better off just defining a high-level "Lock" and "Unlock" interface, and implementing those using whatever the platform provides.

What's a good way to implement simple clone() based multithread library?

I'm trying to build simple multithread library based on linux using clone() and other kernel utilities.I've come to a point where I'm not really sure what's the correct way to do things. I tried going trough original NPTL code but it's a bit too much.
That's how for instance I imagine the create method:
typedef int sk_thr_id;
typedef void *sk_thr_arg;
typedef int (*sk_thr_func)(sk_thr_arg);
sk_thr_id sk_thr_create(sk_thr_func f, sk_thr_arg a){
void* stack;
stack = malloc( 1024*64 );
if ( stack == 0 ){
perror( "malloc: could not allocate stack" );
exit( 1 );
}
return ( clone(f, (char*) stack + FIBER_STACK, SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM, a ) );
}
1: I'm not really sure what the correct clone() flags should be. I just found these being used in a simple example. Any general directions here will be welcome.
Here are parts of the mutex primitives created using futexes(not my own code for now):
#define cmpxchg(P, O, N) __sync_val_compare_and_swap((P), (O), (N))
#define cpu_relax() asm volatile("pause\n": : :"memory")
#define barrier() asm volatile("": : :"memory")
static inline unsigned xchg_32(void *ptr, unsigned x)
{
__asm__ __volatile__("xchgl %0,%1"
:"=r" ((unsigned) x)
:"m" (*(volatile unsigned *)ptr), "0" (x)
:"memory");
return x;
}
static inline unsigned short xchg_8(void *ptr, char x)
{
__asm__ __volatile__("xchgb %0,%1"
:"=r" ((char) x)
:"m" (*(volatile char *)ptr), "0" (x)
:"memory");
return x;
}
int sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3)
{
return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3);
}
typedef union mutex mutex;
union mutex
{
unsigned u;
struct
{
unsigned char locked;
unsigned char contended;
} b;
};
int mutex_init(mutex *m, const pthread_mutexattr_t *a)
{
(void) a;
m->u = 0;
return 0;
}
int mutex_lock(mutex *m)
{
int i;
/* Try to grab lock */
for (i = 0; i < 100; i++)
{
if (!xchg_8(&m->b.locked, 1)) return 0;
cpu_relax();
}
/* Have to sleep */
while (xchg_32(&m->u, 257) & 1)
{
sys_futex(m, FUTEX_WAIT_PRIVATE, 257, NULL, NULL, 0);
}
return 0;
}
int mutex_unlock(mutex *m)
{
int i;
/* Locked and not contended */
if ((m->u == 1) && (cmpxchg(&m->u, 1, 0) == 1)) return 0;
/* Unlock */
m->b.locked = 0;
barrier();
/* Spin and hope someone takes the lock */
for (i = 0; i < 200; i++)
{
if (m->b.locked) return 0;
cpu_relax();
}
/* We need to wake someone up */
m->b.contended = 0;
sys_futex(m, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
return 0;
}
2: The main question for me is how to implement the "join" primitive? I know it's supposed to be based on futexes too. It's a struggle for me for now to come up with something.
3: I need some way to cleanup stuff(like the allocated stack) after a thread has finished. I can't really thing of a good way to do this too.
Probably for these I'll need to have additional structure in user space for every thread with some information saved in it. Can someone point me in good direction for solving these issues?
4: I'll want to have a way to tell how much time a thread has been running, how long it's been since it's last being scheduled and other stuff like that. Are there some kernel calls providing such info?
Thanks in advance!
The idea that there can exist a "multithreading library" as a third-party library separate from the rest of the standard library is an outdated and flawed notion. If you want to do this, you'll have to first drop all use of the standard library; particularly, your call to malloc is completely unsafe if you're calling clone yourself, because:
malloc will have no idea that multiple threads exist, and therefore may fail to perform proper synchronization.
Even if it knew they existed, malloc will need to access an unspecified, implementation-specific structure located at the address given by the thread pointer. As this structure is implementation-specific, you have no way of creating such a structure that will be interpreted correctly by both the current and all future versions of your system's libc.
These issues don't apply just to malloc but to most of the standard library; even async-signal-safe functions may be unsafe to use, as they might dereference the thread pointer for cancellation-related purposes, performing optimal syscall mechanisms, etc.
If you really insist on making your own threads implementation, you'll have to abstain from using glibc or any modern libc that's integrated with threads, and instead opt for something much more naive like klibc. This could be an educational experiment, but it would not be appropriate for a deployed application.
1) You are using an example of LinuxThreads. I will not rewrite good references for directions, but I advise you "The Linux Programming interface" of Michael Kerrisk, chapter 28. It explains in 25 pages, what you need.
2) If you set the CLONE_CHILD_CLEARID flag, when the child terminates, the ctid argument of clone is cleared. If you treat that pointer as a futex, you can implement the join primitive. Good luck :-) If you don't want to use futexes, have also a look to wait3 and wait4.
3) I do not know what you want to cleanup, but you can use the clone tls arugment. This is a thread local storage buffer. If the thread is finished, you can clean that buffer.
4) See getrusage.

Issue with SPI (Serial Port Comm), stuck on ioctl()

I'm trying to access a SPI sensor using the SPIDEV driver but my code gets stuck on IOCTL.
I'm running embedded Linux on the SAM9X5EK (mounting AT91SAM9G25). The device is connected to SPI0. I enabled CONFIG_SPI_SPIDEV and CONFIG_SPI_ATMEL in menuconfig and added the proper code to the BSP file:
static struct spi_board_info spidev_board_info[] {
{
.modalias = "spidev",
.max_speed_hz = 1000000,
.bus_num = 0,
.chips_select = 0,
.mode = SPI_MODE_3,
},
...
};
spi_register_board_info(spidev_board_info, ARRAY_SIZE(spidev_board_info));
1MHz is the maximum accepted by the sensor, I tried 500kHz but I get an error during Linux boot (too slow apparently). .bus_num and .chips_select should correct (I also tried all other combinations). SPI_MODE_3 I checked the datasheet for it.
I get no error while booting and devices appear correctly as /dev/spidevX.X. I manage to open the file and obtain a valid file descriptor. I'm now trying to access the device with the following code (inspired by examples I found online).
#define MY_SPIDEV_DELAY_USECS 100
// #define MY_SPIDEV_SPEED_HZ 1000000
#define MY_SPIDEV_BITS_PER_WORD 8
int spidevReadRegister(int fd,
unsigned int num_out_bytes,
unsigned char *out_buffer,
unsigned int num_in_bytes,
unsigned char *in_buffer)
{
struct spi_ioc_transfer mesg[2] = { {0}, };
uint8_t num_tr = 0;
int ret;
// Write data
mesg[0].tx_buf = (unsigned long)out_buffer;
mesg[0].rx_buf = (unsigned long)NULL;
mesg[0].len = num_out_bytes;
// mesg[0].delay_usecs = MY_SPIDEV_DELAY_USECS,
// mesg[0].speed_hz = MY_SPIDEV_SPEED_HZ;
mesg[0].bits_per_word = MY_SPIDEV_BITS_PER_WORD;
mesg[0].cs_change = 0;
num_tr++;
// Read data
mesg[1].tx_buf = (unsigned long)NULL;
mesg[1].rx_buf = (unsigned long)in_buffer;
mesg[1].len = num_in_bytes;
// mesg[1].delay_usecs = MY_SPIDEV_DELAY_USECS,
// mesg[1].speed_hz = MY_SPIDEV_SPEED_HZ;
mesg[1].bits_per_word = MY_SPIDEV_BITS_PER_WORD;
mesg[1].cs_change = 1;
num_tr++;
// Do the actual transmission
if(num_tr > 0)
{
ret = ioctl(fd, SPI_IOC_MESSAGE(num_tr), mesg);
if(ret == -1)
{
printf("Error: %d\n", errno);
return -1;
}
}
return 0;
}
Then I'm using this function:
#define OPTICAL_SENSOR_ADDR "/dev/spidev0.0"
...
int fd;
fd = open(OPTICAL_SENSOR_ADDR, O_RDWR);
if (fd<=0) {
printf("Device not found\n");
exit(1);
}
uint8_t buffer1[1] = {0x3a};
uint8_t buffer2[1] = {0};
spidevReadRegister(fd, 1, buffer1, 1, buffer2);
When I run it, the code get stuck on IOCTL!
I did this way because, in order to read a register on the sensor, I need to send a byte with its address in it and then get the answer back without changing CS (however, when I tried using write() and read() functions, while learning, I got the same result, stuck on them).
I'm aware that specifying .speed_hz causes a ENOPROTOOPT error on Atmel (I checked spidev.c) so I commented that part.
Why does it get stuck? I though it can be as the device is created but it actually doesn't "feel" any hardware. As I wasn't sure if hardware SPI0 corresponded to bus_num 0 or 1, I tried both, but still no success (btw, which one is it?).
UPDATE: I managed to have the SPI working! Half of it.. MOSI is transmitting the right data, but CLK doesn't start... any idea?
When I'm working with SPI I always use an oscyloscope to see the output of the io's. If you have a 4 channel scope ypu can easily debug the issue, and find out if you're axcessing the right io's, using the right speed, etc. I usually compare the signal I get to the datasheet diagram.
I think there are several issues here. First of all SPI is bidirectional. So if yo want to send something over the bus you also get something. Therefor always you have to provide a valid buffer to rx_buf and tx_buf.
Second, all members of the struct spi_ioc_transfer have to be initialized with a valid value. Otherwise they just point to some memory address and the underlying process is accessing arbitrary data, thus leading to unknown behavior.
Third, why do you use a for loop with ioctl? You already tell ioctl you haven an array of spi_ioc_transfer structs. So all defined transaction will be performed with one ioctl call.
Fourth ioctl needs a pointer to your struct array. So ioctl should look like this:
ret = ioctl(fd, SPI_IOC_MESSAGE(num_tr), &mesg);
You see there is room for improvement in your code.
This is how I do it in a c++ library for the raspberry pi. The whole library will soon be on github. I'll update my answer when it is done.
void SPIBus::spiReadWrite(std::vector<std::vector<uint8_t> > &data, uint32_t speed,
uint16_t delay, uint8_t bitsPerWord, uint8_t cs_change)
{
struct spi_ioc_transfer transfer[data.size()];
int i = 0;
for (std::vector<uint8_t> &d : data)
{
//see <linux/spi/spidev.h> for details!
transfer[i].tx_buf = reinterpret_cast<__u64>(d.data());
transfer[i].rx_buf = reinterpret_cast<__u64>(d.data());
transfer[i].len = d.size(); //number of bytes in vector
transfer[i].speed_hz = speed;
transfer[i].delay_usecs = delay;
transfer[i].bits_per_word = bitsPerWord;
transfer[i].cs_change = cs_change;
i++
}
int status = ioctl(this->fileDescriptor, SPI_IOC_MESSAGE(data.size()), &transfer);
if (status < 0)
{
std::string errMessage(strerror(errno));
throw std::runtime_error("Failed to do full duplex read/write operation "
"on SPI Bus " + this->deviceNode + ". Error message: " +
errMessage);
}
}

Run-time mocking in C?

This has been pending for a long time in my list now. In brief - I need to run mocked_dummy() in the place of dummy() ON RUN-TIME, without modifying factorial(). I do not care on the entry point of the software. I can add up any number of additional functions (but cannot modify code within /*---- do not modify ----*/).
Why do I need this?
To do unit tests of some legacy C modules. I know there are a lot of tools available around, but if run-time mocking is possible I can change my UT approach (add reusable components) make my life easier :).
Platform / Environment?
Linux, ARM, gcc.
Approach that I'm trying with?
I know GDB uses trap/illegal instructions for adding up breakpoints (gdb internals).
Make the code self modifiable.
Replace dummy() code segment with illegal instruction, and return as immediate next instruction.
Control transfers to trap handler.
Trap handler is a reusable function that reads from a unix domain socket.
Address of mocked_dummy() function is passed (read from map file).
Mock function executes.
There are problems going ahead from here. I also found the approach is tedious and requires good amount of coding, some in assembly too.
I also found, under gcc each function call can be hooked / instrumented, but again not very useful since the the function is intended to be mocked will anyway get executed.
Is there any other approach that I could use?
#include <stdio.h>
#include <stdlib.h>
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
int (*fp)(int) = atoi(argv[1]);
printf("fp = %x\n",fp);
printf("factorial of 5 is = %d\n",fp(5));
printf("factorial of 5 is = %d\n",factorial(5));
return 1;
}
test-dept is a relatively recent C unit testing framework that allows you to do runtime stubbing of functions. I found it very easy to use - here's an example from their docs:
void test_stringify_cannot_malloc_returns_sane_result() {
replace_function(&malloc, &always_failing_malloc);
char *h = stringify('h');
assert_string_equals("cannot_stringify", h);
}
Although the downloads section is a little out of date, it seems fairly actively developed - the author fixed an issue I had very promptly. You can get the latest version (which I've been using without issues) with:
svn checkout http://test-dept.googlecode.com/svn/trunk/ test-dept-read-only
the version there was last updated in Oct 2011.
However, since the stubbing is achieved using assembler, it may need some effort to get it to support ARM.
This is a question I've been trying to answer myself. I also have the requirement that I want the mocking method/tools to be done in the same language as my application. Unfortunately this cannot be done in C in a portable way, so I've resorted to what you might call a trampoline or detour. This falls under the "Make the code self modifiable." approach you mentioned above. This is were we change the actually bytes of a function at runtime to jump to our mock function.
#include <stdio.h>
#include <stdlib.h>
// Additional headers
#include <stdint.h> // for uint32_t
#include <sys/mman.h> // for mprotect
#include <errno.h> // for errno
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummy(void)
{
printf("__%s__()\n",__func__);
}
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
typedef void (*dummy_fun)(void);
void set_run_mock()
{
dummy_fun run_ptr, mock_ptr;
uint32_t off;
unsigned char * ptr, * pg;
run_ptr = dummy;
mock_ptr = mocked_dummy;
if (run_ptr > mock_ptr) {
off = run_ptr - mock_ptr;
off = -off - 5;
}
else {
off = mock_ptr - run_ptr - 5;
}
ptr = (unsigned char *)run_ptr;
pg = (unsigned char *)(ptr - ((size_t)ptr % 4096));
if (mprotect(pg, 5, PROT_READ | PROT_WRITE | PROT_EXEC)) {
perror("Couldn't mprotect");
exit(errno);
}
ptr[0] = 0xE9; //x86 JMP rel32
ptr[1] = off & 0x000000FF;
ptr[2] = (off & 0x0000FF00) >> 8;
ptr[3] = (off & 0x00FF0000) >> 16;
ptr[4] = (off & 0xFF000000) >> 24;
}
int main(int argc, char * argv[])
{
// Run for realz
factorial(5);
// Set jmp
set_run_mock();
// Run the mock dummy
factorial(5);
return 0;
}
Portability explanation...
mprotect() - This changes the memory page access permissions so that we can actually write to memory that holds the function code. This isn't very portable, and in a WINAPI env, you may need to use VirtualProtect() instead.
The memory parameter for mprotect is aligned to the previous 4k page, this also can change from system to system, 4k is appropriate for vanilla linux kernel.
The method that we use to jmp to the mock function is to actually put down our own opcodes, this is probably the biggest issue with portability because the opcode I've used will only work on a little endian x86 (most desktops). So this would need to be updated for each arch you plan to run on (which could be semi-easy to deal with in CPP macros.)
The function itself has to be at least five bytes. The is usually the case because every function normally has at least 5 bytes in its prologue and epilogue.
Potential Improvements...
The set_mock_run() call could easily be setup to accept parameters for reuse. Also, you could save the five overwritten bytes from the original function to restore later in the code if you desire.
I'm unable to test, but I've read that in ARM... you'd do similar but you can jump to an address (not an offset) with the branch opcode... which for an unconditional branch you'd have the first bytes be 0xEA and the next 3 bytes are the address.
Chenz
An approach that I have used in the past that has worked well is the following.
For each C module, publish an 'interface' that other modules can use. These interfaces are structs that contain function pointers.
struct Module1
{
int (*getTemperature)(void);
int (*setKp)(int Kp);
}
During initialization, each module initializes these function pointers with its implementation functions.
When you write the module tests, you can dynamically changes these function pointers to its mock implementations and after testing, restore the original implementation.
Example:
void mocked_dummy(void)
{
printf("__%s__()\n",__func__);
}
/*---- do not modify ----*/
void dummyFn(void)
{
printf("__%s__()\n",__func__);
}
static void (*dummy)(void) = dummyFn;
int factorial(int num)
{
int fact = 1;
printf("__%s__()\n",__func__);
while (num > 1)
{
fact *= num;
num--;
}
dummy();
return fact;
}
/*---- do not modify ----*/
int main(int argc, char * argv[])
{
void (*oldDummy) = dummy;
/* with the original dummy function */
printf("factorial of 5 is = %d\n",factorial(5));
/* with the mocked dummy */
oldDummy = dummy; /* save the old dummy */
dummy = mocked_dummy; /* put in the mocked dummy */
printf("factorial of 5 is = %d\n",factorial(5));
dummy = oldDummy; /* restore the old dummy */
return 1;
}
You can replace every function by the use of LD_PRELOAD. You have to create a shared library, which gets loaded by LD_PRELOAD. This is a standard function used to turn programs without support for SOCKS into SOCKS aware programs. Here is a tutorial which explains it.

Resources