Linux Kernel - How to match a jprobe to kretprobe? - c

I am writing a kernel module to monitor a few syscalls wanting to return the function arguments to user-land (via netlink socket) if the call was successful.
jprobe.kp.symbol_name = "rename";
jprobe.entry = rename_handler;
kretprobe.kp.symbol_name = "rename";
kretprobe.handler = rename_ret_handler;
static rename_obj_t _g_cur_rename = NULL;
static void _rename_handler(const char *oldpath, const char *newpath)
{
_g_cur_rename = create_rename(oldpath, newpath);
jprobe_return();
}
static void _rename_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
/* Send only if successful */
if (regs_return_value(regs) == 0) {
add_send_queue(_g_cur_rename);
}
return 0;
}
I worry that another rename syscall may preempt[1] the current one after the jprobe and I will send incorrect return codes and arguments.
jprobe: rename(a, b)
jprobe rename(c, d)
kretprobe
kretprobe
Edit: This article[2] states that interrupts are disabled during a kprobe handler. But does that mean that interrupts are disable throughout the whole chain (jprobe -> kprobe -> kretprobe) or just for that single kprobe?
https://unix.stackexchange.com/questions/186355/few-questions-about-system-calls-and-kernel-modules-kernel-services-in-parallel
https://lwn.net/Articles/132196/

Interrupts are disabled for each jprobe call: not for the entire sequence.
How many calls are you expecting in the time it will take the application to process them? There are different approaches depending on how fast you expect the calls to come in. The simplest method, if you are only expecting maybe a few hundred calls before you can process them and you will dedicate the static memory to the purpose, is to implement a static array of rename_obj_t objects in memory and then use atomic_add from the kernel asm includes to point to the next entry (mod the size of your array).
This way you are returning a unique static reference each time, so long as the counter doesn't wrap around before you process the returned values. atomic_add is guaranteed to have the correct memory barriers in place so you don't have to worry about things like cache coherency.

Related

Shared pointers and queues in FreeRTOS

A C++ wapper around a FreeRTOS queue can be simplified into something like this:
template<typename T>
class Queue<T>
{
public:
bool push(const T& item)
{
return xQueueSendToBack(handle, &item, 0) == pdTRUE;
}
bool pop(T& target)
{
return xQueueReceive(handle, &target, 0) == pdTRUE;
}
private:
QueueHandle_t handle;
}
The documentation of xQueueSendToBack states:
The item is queued by copy, not by reference.
Unfortunately, it is literally by copy, because it all ends in a memcpy, which makes sense since it is a C API. While this works well for plain old data, more complex items such as the following event message give serious problems.
class ConnectionStatusEvent
{
public:
ConnectionStatusEvent() = default;
ConnectionStatusEvent(std::shared_ptr<ISocket> sock)
: sock(sock)
{
}
const std::shared_ptr<ISocket>& get_socket() const
{
return sock;
}
private:
const std::shared_ptr<ISocket> sock;
bool connected;
};
The problem is obviously the std::shared_ptr which doesn't work at all with a memcpy since its copy constructor/assignment operator isn't called when copied onto the queue, resulting in premature deletion of the held object when the event message, and thus the shared_ptr, goes out of scope.
I could solve this by using dynamically allocated T-instances and change the queues to only contain pointers to the instance, but I'd rather not do that since this shall run on an embedded system and I very much want to keep the memory static at run-time.
My current plan is to change the queue to contain pointers to a locally held memory area in the wrapper class in which I can implement full C++ object-copy, but as I'd also need to protect that memory area against multiple thread access, it essentially defeats the already thread-safe implementation of the FreeRTOS queues (which surely are more efficient than any implementation I can write myself) I might as well skip them entirely.
Finally, the question:
Before I implement my own queue, are there any tricks I can use to make the FreeRTOS queues function with C++ object instances, in particular std::shared_ptr?
The issue is what happens to the original once you put the pointer into the queue.
Copying seems trivial but not optimal.
To get around this issue i use a mailbox instead of a queue:
T* data = (T*) osMailAlloc(m_mail, osWaitForever);
...
osMailPut (m_mail, data);
Where you allocate the pointer explicitly to begin with. And just add the pointer to the mailbox.
And to retrieve:
osEvent ev = osMailGet(m_mail, osWaitForever);
...
osStatus freeStatus = osMailFree(m_mail, p);
All can be neatly warpend into c++ template methods.

PortAudio callbacks, and changing a variable elsewhere

I'm using PortAudio's callback API for designing a signal processing loopback library.
I'd like to add a branch that depends on a flag inside the callback, so like
int pa_callback(const void *inputuffer,
void *outputBuffer,
unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData)
{
if (do_something_flag) {
do_something(inputBuffer, outputBuffer, frameCount);
} else {
do_something_else(inputBuffer, outputBuffer, frameCount);
}
return paContinue;
}
Where do_something_flag is set elsewhere in my program at regular intervals.
The PortAudio callback documentation states:
Before we begin, it's important to realize that the callback is a
delicate place. This is because some systems perform the callback in a
special thread, or interrupt handler, and it is rarely treated the
same as the rest of your code. For most modern systems, you won't be
able to cause crashes by making disallowed calls in the callback, but
if you want your code to produce glitch-free audio, you will have to
make sure you avoid function calls that may take an unbounded amount
of time to execute. Exactly what these are depend on your platform but
almost certainly include the following: memory
allocation/deallocation, I/O (including file I/O as well as console
I/O, such as printf()), context switching (such as exec() or yield()),
mutex operations, or anything else that might rely on the OS. If you
think short critical sections are safe please go read about priority
inversion.
I don't care about the atomicity of the do_something_flag. That is, I don't care how many cycles it takes to get a correct value (within reason).
According to the documentation, it looks like I can't use mutexes for setting/reading that variable.
1) What are my options?
2) If I make it global and set it in another part of my program (another thread), what is the absolute worst that will happen? Again, I mean in terms of corrupting data to the point of program failure/etc.
Is there a right way to do this?
I'm not totally sure what you're exactly trying to do but I'm guessing it's what your title is asking about - "Changing a variable elsewhere".
Let's take this example: you have a variable frequency that changes over time. How do you access this? Well you have a generic pointer in the callback called userData. This can point to anything - a data structure, array, etc. I don't really remember how often the callback function gets called (it's pretty often... I wouldn't worry about speed) but the userData allows you to have variables that can be changed in your main thread while the pointer in the audio thread allows you to access it directly in the memory... My knowledge on thread safety isn't the best and sorry if that isn't the best explanation but I can at least show you how to do it through code (below).
This is how i usually do it but you don't need to do it yourself; I set a structure at the top of my file like so:
typedef struct {
float freq;
float vol;
}paData;
Obviously you'll initialize this somewhere in your code (probably in your main function call) and open the audio stream as such (data is of type paData):
/* Open audio stream */
err = Pa_OpenStream(&(*stream),
&inputParameters,
&outputParameters,
SAMPLE_RATE, bufSize, paNoFlag,
paCallback, &data);
After opening it you can have your callback like this:
static int pa_callback(const void *inputBffer,
void *outputBuffer,
unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData)
{
// cast data so we can use it
paData *data = (paData *)userData;
// what's our frequency?
printf("%f\n", data->freq);
/* Do something with your code here */
return paContinue;
}
Hope that helps.

Store extra data in a c function pointer

Suppose there is a library function (can not modify) that accept a callback (function pointer) as its argument which will be called at some point in the future. My question: is there a way to store extra data along with the function pointer, so that when the callback is called, the extra data can be retrieved. The program is in c.
For example:
// callback's type, no argument
typedef void (*callback_t)();
// the library function
void regist_callback(callback_t cb);
// store data with the function pointer
callback_t store_data(callback_t cb, int data);
// retrieve data within the callback
int retrieve_data();
void my_callback() {
int a;
a = retrieve_data();
// do something with a ...
}
int my_func(...) {
// some variables that i want to pass to my_callback
int a;
// ... regist_callback may be called multiple times
regist_callback(store_data(my_callback, a));
// ...
}
The problem is because callback_t accept no argument. My idea is to generate a small piece of asm code each time to fill into regist_callback, when it is called, it can find the real callback and its data and store it on the stack (or some unused register), then jump to the real callback, and inside the callback, the data can be found.
pseudocode:
typedef struct {
// some asm code knows the following is the real callback
char trampoline_code[X];
callback_t real_callback;
int data;
} func_ptr_t;
callback_t store_data(callback_t cb, int data) {
// ... malloc a func_ptr_t
func_ptr_t * fpt = malloc(...);
// fill the trampoline_code, different machine and
// different calling conversion are different
// ...
fpt->real_callback = cb;
fpt->data = data;
return (callback_t)fpt;
}
int retrieve_data() {
// ... some asm code to retrive data on stack (or some register)
// and return
}
Is it reasonable? Is there any previous work done for such problem?
Unfortunately you're likely to be prohibited from executing your trampoline in more and more systems as time goes on, as executing data is a pretty common way of exploiting security vulnerabilities.
I'd start by reporting the bug to the author of the library. Everybody should know better than to offer a callback interface with no private data parameter.
Having such a limitation would make me think twice about how whether or not the library is reentrant. I would suggest ensuring you can only have one call outstanding at a time, and store the callback parameter in a global variable.
If you believe that the library is fit for use, then you could extend this by writing n different callback trampolines, each referring to their own global data, and wrap that up in some management API.

Problem with Array of Queues in FreeRTOS

I am building a FreeRTOS application. I created a module which registers a freeRTOS queue handle from another module and when an interrupt in this module module occurs, it sends a message to all the registered queues. But it seems I am able to send the message from the queue but not able to receive it at the other module.
Here is my code.
remote module:-
CanRxMsg RxMessage;
can_rx0_queue = xQueueCreate( 10, sizeof(CanRxMsg) ); // can_rx0_queue is globally defined
// Register my queue with can module
if (registerRxQueueWithCAN(can_rx0_queue) == -1)
{
TurnLedRed();
}
while(1)
{
if(can_rx0_queue){
while( xQueueReceive( can_rx0_queue, ( void * ) &RxMessage, portMAX_DELAY))
{
}
.....
Here is the registration module
#define MAX_NUMBER_OF_RX_QUEUES 2
//xQueueHandle rxQueueStore[MAX_NUMBER_OF_RX_QUEUES];
typedef struct QUEUE_REGISTRY_ITEM
{
// signed char *pcQueueName;
xQueueHandle xHandle;
} xQueueRegistryItem;
xQueueRegistryItem rxQueueStore[MAX_NUMBER_OF_RX_QUEUES];
int numberOfQueuesRegistered;
#define cError -1
#define cSuccess 0
void processInterrupt()
{
for(int i=0; i < numberOfQueuesRegistered; i++)
{
if(xQueueSendFromISR(rxQueueStore[i].xHandle,(void *) &RxMessage,&tmp) != pdTRUE)
TurnLedRed();
if(tmp)resched_needed = pdTRUE;
}
portEND_SWITCHING_ISR(resched_needed);
}
int registerRxQueueWithCAN(xQueueHandle myQueue)
{
if(numberOfQueuesRegistered == MAX_NUMBER_OF_RX_QUEUES)
{
// Over Flow of registerations
TurnLedRed();
return cError;
}else
{
rxQueueStore[numberOfQueuesRegistered].xHandle = myQueue;
numberOfQueuesRegistered++;
}
return cSuccess;
}
Few points:-
xQuehandle is typdefed to "void *"
The code works if remove the registration thing and just do with directly pointer of queue in xQueueSendFromISR if I take the pointer by extern.
Any advice or information required?
At first glance I cannot see anything obviously wrong. The problem might be outside of the code you have shown, like how is can_rx0_queue declared, how is the interrupt entered, which port are you using, etc.
There is a FreeRTOS support forum, linked to from the FreeRTOS home page http://www.FreeRTOS.org
Regards.
I think Richard is right. The problem could be issues that are not within your code that you have posted here.
Are you calling any form of suspension on the receiving Task that is waiting on the Queue? When you invoke a vTaskSuspend() on a Task that is blocked waiting on a Queue, the Task that is suspended will be moved to the pxSuspendedTaskList and it will "forget" that it is waiting on an Event Queue because the pvContainer of xEventListItem in that Task will be set to NULL.
You might want to check if your receiving Task is ever suspended while waiting on a Queue. Hope that helped. Cheers!
Your shared memory should at least be declared volatile:
volatile xQueueRegistryItem rxQueueStore[MAX_NUMBER_OF_RX_QUEUES] ;
volatile int numberOfQueuesRegistered ;
otherwise the compiler may optimise out read or writes to these because it has no concept of different threads of execution (between the ISR and the main thread).
Also I recall that some PIC C runtime start-up options do not apply zero-initialisation of static data in order to minimise start-up time, if you are using such a start-up, you should explicitly initialise numberOfQueuesRegistered. I would suggest that to do so would be a good idea in any case.
It is not clear from your code that RxMessage in the ISR is not the same as RxMessage in the 'remote module'; they should not be shared, since that would allow the ISR to potentially modify the data while the receiving thread was processing it. If they could be shared, there would ne no reason to have a queue in the first place, since shared memory and a semaphore would suffice.
As a side-note, there is never any need to cast a pointer to void*, and you should generally avoid doing so, since it will prevent the compiler from issuing an error if you were to pass something other than a pointer. The whole point of a void* is rather that it can accept any pointer type.

In a C program, is it possible to reset all global variables to default vaues?

I have a legacy C Linux application that I need to reuse . This application uses a lot of global variables. I want to reuse this application's main method and invoke that in a loop. I have found that when I call the main method( renamed to callableMain) in a loop , the application behavior is not consistent as the values of global variables set in previous iteration impact the program flow in the new iteration.
What I would like to do is to reset all the global variables to the default value before the execution of the the new iteration.
for example , the original program is like this
OriginalMain.C
#include <stdio.h>
int global = 3; /* This is the global variable. */
void doSomething(){
global++; /* Reference to global variable in a function. */
}
// i want to rename this main method to callableMain() and
// invoke it in a loop
int main(void){
if(global==3) {
printf(" All Is Well \n");
doSomething() ;
}
else{
printf(" Noooo\n");
doNothing() ;
}
return 0;
}
I want to change this program as follows:
I changed the above file to rename the main() to callableMain()
And my new main methods is as follows:
int main(){
for(int i=0;i<20;i++){
callableMain();
// this is where I need to reset the value of global vaiables
// otherwise the execution flow changes
}
}
Is this possible to reset all the global variables to the values before main() was invoked ?
The short answer is that there is no magical api call that would reset global variables. The global variables would have to be cached and reused.
I would invoke it as a subprocess, modifying its input and output as needed. Let the operating system do the dirty work for you.
The idea is to isolate the legacy program from your new program by relegating it to its own process. Then you have a clean separation between the two. Also, the legacy program is reset to a clean state every time you run it.
First, modify the program so that it reads the input data from a file, and writes its output in a machine-readable format to another file, with the files being given on the command line.
You can then create named pipes (using the mkfifo call) and invoke the legacy program using system, passing it the named pipes on the command line. Then you feed it its input and read back its output.
I am not an expert on these matters; there is probably a better way of doing the IPC. Others here have mentioned fork. However, the basic idea of separating out the legacy code and invoking it as a subprocess is probably the best approach here.
fork() early?
You could fork(2) at some early point when you think the globals are in a good state, and then have the child wait on a pipe or something for some work to do. This would require writing any changed state or at least the results back to the parent process but would decouple your worker from your primary control process.
In fact, it might make sense to fork() at least twice, once to set up a worker controller and save the initialized (but not too initialized :-) global state, and then have this worker controller fork() again for each loop you need run.
A simpler variation might be to just modify the code so that the process can start in a "worker mode", and then use fork() or system() to start the application at the top, but with an argument that puts it in to the slave mode.
There is a way to do this on certain platforms / compilers, you'd basically be performing the same initialization your compiler performs before calling main().
I have done this for a TI DSP, in that case I had the section with globals mapped to a specific section of memory and there were linker directives available that declared variables pointing to the start and end of this section (so you can memset() the whole area to zero before starting initialization). Then, the compiler provided a list of records, each of which comprised of an address, data length and the actual data to be copied into the address location. So you'd just loop through the records and do memcpy() into the target address to initialize all globals.
Very compiler specific, so hopefully the compiler you're using allows you to do something similar.
In short, no. What I would do in this instance is create definitions, constants if you will, and then use those to reset the global variables with.
Basically
#define var1 10
int vara = 10
etc... basic C right?
You can then go ahead and wrap the reinitialization in a handy function =)
I think you must change the way you see the problem.
Declare all the variables used by callableMain() inside callableMain()'s body, so they are not global anymore and are destroyed after the function is executed and created once again with the default values when you call callableMain() on the next iteration.
EDIT:
Ok, here's what you could do if you have the source code for callableMain(): in the beginning of the function, add a check to verify if its the first time the function its being called. Inside this check you will copy the values of all global variables used to another set of static variables (name them as you like). Then, on the function's body replace all occurences of the global variables by the static variables you created.
This way you will preserve the initial values of all the global variables and use them on every iteration of callableMain(). Does it makes sense to you?
void callableMain()
{
static bool first_iter = true;
if (first_iter)
{
first_iter = false;
static int my_global_var1 = global_var1;
static float my_global_var2 = global_var2;
..
}
// perform operations on my_global_var1 and my_global_var2,
// which store the default values of the original global variables.
}
for (int i = 0; i < 20; i++) {
int saved_var1 = global_var1;
char saved_var2 = global_var2;
double saved_var3 = global_var3;
callableMain();
global_var1 = saved_var1;
global_var2 = saved_var2;
global_var3 = saved_var2;
}
Or maybe you can find out where global variables start memcpy them. But I would always cringe when starting a loop ...
for (int i = 0; i < 20; i++) {
static unsigned char global_copy[SIZEOFGLOBALDATA];
memcpy(global_copy, STARTOFGLOBALDATA, SIZEOFGLOBALDATA);
callableMain();
memcpy(STARTOFGLOBALDATA, global_copy, SIZEOFGLOBALDATA);
}
If you don't want to refactor the code and encapsulate these global variables, I think the best you can do is define a reset function and then call it within the loop.
Assuming we are dealing with ELF on Linux, then the following function to reset the variables works
// these extern variables come from glibc
// https://github.com/ysbaddaden/gc/blob/master/include/config.h
extern char __data_start[];
extern char __bss_start[];
extern char _end[];
#define DATA_START ((char *)&__data_start)
#define DATA_END ((char *)&__bss_start)
#define BSS_START ((char *)&__bss_start)
#define BSS_END ((char *)&_end)
/// first call saves globals, subsequent calls restore
void reset_static_data();
// variable for quick check
static int pepa = 42;
// writes to memory between global variables are reported as buffer overflows by asan
ATTRIBUTE_NO_SANITIZE_ADDRESS
void reset_static_data()
{
// global variable, ok to leak it
static char * x;
size_t s = BSS_END - DATA_START;
// memcpy is always sanitized, so access memory as chars in a loop
if (x == NULL) { // store current static variables
x = (char *) malloc(s);
for (size_t i = 0; i < s; i++) {
*(x+i) = *(DATA_START + i);
}
} else { // restore previously saved static variables
for (size_t i = 0; i < s; i++) {
*(DATA_START + i) = *(x+i);
}
}
// quick check, see that pepa does not grow in stderr output
fprintf(stderr, "pepa: %d\n", pepa++);
}
The general approach is based on answer in How to get the data and bss address space in run time (In Unix C program), see the linked ysbaddaden/gc GitHub repo for macOS version of the macros.
To test the above code, just call it a few times and note that the incremented global variable pepa still keeps the value of 42.
reset_static_data();
reset_static_data();
reset_static_data();
Saving current state of the globals is convenient in that it does not require rerunning __attribute__((constructor)) functions which would be necessary if I set everything in .bss to zero (which is easy) and everything in .data to the initial values (which is not so easy). For example, if you load libpython3.so in your program, it does do run-time initialization which is lost by zeroing .bss. Calling into Python then crashes.
Sanitizers
Writing into areas of memory immediately before or after a static variable will trigger buffer-overflow warning from Address Sanitizer. To prevent this, use the ATTRIBUTE_NO_SANITIZE_ADDRESS macro the way the code above does. The macro is defined in sanitizer/asan_interface.h.
Code coverage
Code coverage counters are implemented as global variables. Therefore, resetting globals will cause coverage information to be forgotten. To solve this, always dump the coverage-to-date before restoring the globals. There does not seem to be a macro to detect whether code coverage is enabled or not in the compiler, so use your build system (CMake, ...) to define suitable macro yourself, such as QD_COVERAGE below.
// The __gcov_dump function writes the coverage counters to gcda files
// and the __gcov_reset function resets them to zero.
// The interface is defined at https://github.com/gcc-mirror/gcc/blob/7501eec65c60701f72621d04eeb5342bad2fe4fb/libgcc/libgcov-interface.c
extern "C" void __gcov_reset();
extern "C" void __gcov_dump();
void flush_coverage() {
#if defined(QD_COVERAGE)
__gcov_dump();
__gcov_reset();
#endif
}

Resources