Use of "__kprobes" and how it works?

Use of "__kprobes" and how it works? - c

While referring to memory module of Linux kernel some functions are not clear to me. One of the functions is shown below:
static inline int __kprobes notify_page_fault(struct pt_regs *regs)
{
int ret = 0;
/* kprobe_running() needs smp_processor_id() */
if (kprobes_built_in() && !user_mode_vm(regs)) {
preempt_disable();
if (kprobe_running() && kprobe_fault_handler(regs, 14))
ret = 1;
preempt_enable();
}
return ret;
}
I am confused with the "__kprobes" between return type and function name. When I looked at the initialization of "__kprobes" in compiler.h, I found below:
/*Ignore/forbid kprobes attach on very low level functions marked by
this attribute:*/
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))
#else
# define __kprobes
#endif
Well, I know that at compile time __kprobe is going to be replaced with its defined part.
Questions:
1.) What is the significance of __attribute__((__section__(".kprobes.text")))?
and
2.) What does it do at compile time and at run time when it is used before "function_name"?
I read about kprobe and found that it has to do something about breakpoints and back trace. What I understand about kprobe is it will help debugger in creating back traces and breakpoints. Could someone please explain me in simple words how does it really works and please correct me if I am wrong.

TL;DR
__attribute__((__section__(".kprobes.text"))) will place that function in separate section which is not findable by kprobes thus preventing infinite breakpoints.
You must use it before "function_name" to place whole "function_name" symbol in separate section.
Real answer
kprobes (kernel probes) is Linux kernel mechanism for dynamic tracing. It allows you to insert breakpoint at almost any kernel function, invoke your handler and then continue executing. It works by runtime patching kernel image with so-called kernel probe/kprobe - see struct kprobe. This probe will allow you to pass control to your handler, and that handler is usually do some tracing.
So, what's going under the hood:
You create your struct kprobe by defining address at which to break and handler to pass reference.
You register your probe with register_kprobe
Kernel kprobe subsystem finds address from your probe
Then kprobe:
inserts breakpoint CPU instruction (int 3 for x86) at given address
adds some wrapper code to save context(registers, etc.)
adds even more code to help you get access to function arguments or return values.
Now when kernel execution hits that probed address:
it will fall into CPU trap
it will save context
it will pass control to your handler via notifier_call_chain
...
after all it will restore context
That's how it works. As you can see it's a really neat and dirty hack, but some kernel function is so terribly low-level that it's just pointless to probe them. notify_page_fault is one of those functions - as a part of notifier_call_chain it's used in passing control to your handler.
So if you probe at notify_page_fault you'll get infinite loop of breakpoints, which is not what you want. What you really want is to protect that kind of functions and kprobes do this by placing it in separate section .kprobes.text. This will prevent to probe at that functions because kprobe will not lookup for address in that section. And that's a job for __attribute__((__section__(".kprobes.text"))).

Related

stm32 - Interrupt handle

In external interrupt function, I want to reset by calling main function. But afterwards, if I have a new interrupt trigger, MCU thinks that It's handling in interrupt function and It doesn't call interrupt function again. What is my solution? (in my project, I'm not allowed to call soft Reset function)

Calling main() in any event is a bad idea, calling it from an interrupt handler is a really bad idea as you have discovered.
What you really need is to modify the stack and link-register so that when the interrupt context exits,, it "returns" to main(), rather than from whence it came. That is a non-trivial task, probably requiring some assembler code or compiler intrinsics.
You have to realise that the hardware will not have been restored to its reset state; you will probably need at least to disable all interrupts to prevent them occurring while the system is re-initialising.
Moreover the standard library will not be reinitialised if you jump to main(); rather than the reset vector. In particular, any currently allocated dynamic memory will instantly leak away and become unusable. In fact all of the C run-time environment initialisation will be skipped - leaving amongst for example static and global data in its last state rather than applying correct initialisation.
In short it is dangerous, error-prone, target specific, and fundamentally poor practice. Most of what you would have to do to make it work is already done in the start-up code that is executed before main() is called, so it would be far simpler to invoke that. The difference between that and forcing a true reset (via the watchdog or AICR) is that the on-chip peripheral state remains untouched (apart from any initialisation explicitly done in the start-up). In my experience, if you are using a more complex peripheral such as USB, safely restarting the system without a true reset is difficult to achieve safely (or at least it is difficult to determine how to do it safely) and hardly worth the effort.

Reset by calling main() is wrong. There is code in front of main inserted by the linker and C-runtime that you will skip by such soft-reset.
Instead, call NVIC_SystemReset() or enable the IWDG and while(1){} to reset.
The HAL should have example files for the watchdog timer.
SRAM is maintained. Any value not initialized by the linker script will still be there.

Calling Main() from any point of your code is a wrong idea if you are not resetting the stack and setting the initial values.
There is always a initialization function ( that actually calls Main()) which is inside an interrupt vector, usually this function can be triggered by calling the function NVIC_SystemReset(void) , be sure than you enable this interrupt so it can be software triggered.
As far as I know, when get inside and interrupt code, other interruptions are inhibit, I am thinking on two different options:
Enable the interruptions inside the interruption and call the function NVIC_SystemReset(void)
Modify the stack and push the direction of the function NVIC_SystemReset(void) so when you go out of the interruption it could be executed.

Safely Exiting to a Particular State in Case of Error

When writing code I often have checks to see if errors occurred. An example would be:
char *x = malloc( some_bytes );
if( x == NULL ){
fprintf( stderr, "Malloc failed.\n" );
exit(EXIT_FAILURE);
}
I've also used strerror( errno ) in the past.
I've only ever written small desktop appications where it doesn't matter if the program exit()ed in case of an error.
Now, however, I'm writing C code for an embedded system (Arduino) and I don't want the system to just exit in case of an error. I want it to go to a particular state/function where it can power down systems, send error reports and idle safely.
I could simply call an error_handler() function, but I could be deep in the stack and very low on memory, leaving error_handler() inoperable.
Instead, I'd like execution to effectively collapse the stack, free up a bunch of memory and start sorting out powering down and error reporting. There is a serious fire risk if the system doesn't power down safely.
Is there a standard way that safe error handling is implemented in low memory embedded systems?
EDIT 1:
I'll limit my use of malloc() in embedded systems. In this particular case, the errors would occur when reading a file, if the file was not of the correct format.

Maybe you're waiting for the Holy and Sacred setjmp/longjmp, the one who came to save all the memory-hungry stacks of their sins?
#include <setjmp.h>
jmp_buf jumpToMeOnAnError;
void someUpperFunctionOnTheStack() {
if(setjmp(jumpToMeOnAnError) != 0) {
// Error handling code goes here
// Return, abort(), while(1) {}, or whatever here...
}
// Do routinary stuff
}
void someLowerFunctionOnTheStack() {
if(theWorldIsOver)
longjmp(jumpToMeOnAnError, -1);
}
Edit: Prefer not to do malloc()/free()s on embedded systems, for the same reasons you said. It's simply unhandable. Unless you use a lot of return codes/setjmp()s to free the memory all the way up the stack...

If your system has a watchdog, you could use:
char *x = malloc( some_bytes );
assert(x != NULL);
The implementation of assert() could be something like:
#define assert (condition) \
if (!(condition)) while(true)
In case of a failure the watchdog would trigger, the system would make a reset. At restart the system would check the reset reason, if the reset reason was "watchdog reset", the system would goto a safe state.
update
Before entering the while loop, assert cold also output a error message, print the stack trace or save some data in non volatile memory.

Is there a standard way that safe error handling is implemented in low memory embedded systems?
Yes, there is an industry de facto way of handling it. It is all rather simple:
For every module in your program you need to have a result type, such as a custom enum, which describes every possible thing that could go wrong with the functions inside that module.
You document every function properly, stating what codes it will return upon error and what code it will return upon success.
You leave all error handling to the caller.
If the caller is another module, it too passes on the error to its own caller. Possibly renames the error into something more suitable, where applicable.
The error handling mechanism is located in main(), at the bottom of the call stack.
This works well together with classic state machines. A typical main would be:
void main (void)
{
for(;;)
{
serve_watchdog();
result = state_machine();
if(result != good)
{
error_handler(result);
}
}
}
You should not use malloc in bare bone or RTOS microcontroller applications, not so much because of safety reasons, but simple because it doesn't make any sense whatsoever to use it. Apply common sense when programming.

Use setjmp(3) to set a recovery point, and longjmp(3) to jump to it, restoring the stack to what it was at the setjmp point. It wont free malloced memory.
Generally, it is not a good idea to use malloc/free in an embedded program if it can be avoided. For example, a static array may be adequate, or even using alloca() is marginally better.

to minimize stack usage:
write the program so the calls are in parallel rather than function calls sub function that calls sub function that calls sub function.... I.E. top level function calls sub function where sub function promptly returns, with status info. top level function then calls next sub function... etc
The (bad for stack limited) nested method of program architecture:
top level function
second level function
third level function
forth level function
should be avoided in embedded systems
the preferred method of program architecture for embedded systems is:
top level function (the reset event handler)
(variations in the following depending on if 'warm' or 'cold' start)
initialize hardware
initialize peripherals
initialize communication I/O
initialize interrupts
initialize status info
enable interrupts
enter background processing
interrupt handler
re-enable the interrupt
using 'scheduler'
select a foreground function
trigger dispatch for selected foreground function
return from interrupt
background processing
(this can be, and often is implemented as a 'state' machine rather than a loop)
loop:
if status info indicates need to call second level function 1
second level function 1, which updates status info
if status info indicates need to call second level function 2
second level function 2, which updates status info
etc
end loop:
Note that, as much as possible, there is no 'third level function x'
Note that, the foreground functions must complete before they are again scheduled.
Note: there are lots of other details that I have omitted in the above, like
kicking the watchdog,
the other interrupt events,
'critical' code sections and use of mutex(),
considerations between 'soft real-time' and 'hard real-time',
context switching
continuous BIT, commanded BIT, and error handling
etc

How come a compiler cannot detect if a global variable is changed by another thread?

I was just going through concepts of the volatile keyword. I just gone through this link, this link is telling about why to use the volatile keyword in case of program using interrupt handler. They have mentioned in one example:
int etx_rcvd = FALSE;
void main()
{
...
while (!ext_rcvd)
{
// Wait
}
...
}
interrupt void rx_isr(void)
{
...
if (ETX == rx_char)
{
etx_rcvd = TRUE;
}
...
}
They are saying since compiler is not able to know ext_rcvd is getting updated in an interrupt handler. So compiler uses optimization intelligence and assumes that this variable value is always FALSE and it never enters into the while{} condition. So to prevent these situation we use volatile keyword, which stops compiler to use its own intelligence.
My question is, While compiling, how compiler is not able to know that ext_rcvd is getting updated in interrupt handler? PLease help me to find its answer, I am not getting correct answer for this.

The compiler cannot analyze all the codes or running processes that can modify the memory location of ext_rcvd.
In your example, you mentioned that ext_rcvd is being updated in an interrupt handler. That is correct. The interrupt handler is a piece of code launched by the Operating System, when the CPU receives an interrupt. That piece of code is actually the driver code. In the driver code, ext_rcvd may have another name but point to the same memory location.
So in order to know if ext_rcvd is updated somewhere else, the compiler needs to analyze the libraries and drivers' code and to figure out that they are updating the exact same memory location that you name ext_rcvd in your code. This cannot be done before execution time.
The same goes for multi-threading. The compiler cannot know a-priori if a certain thread is updating the exact memory location used by another thread. For example if another thread makes a syscall() then the compiler needs to look in the code handling the syscall().

When the CPU receives an interrupt, it stops whatever it's doing (unless it's processing a more important interrupt, in which case it will deal with this one only when the more important one is done), saves certain parameters on the stack and calls the interrupt handler. This means that certain things are not allowed in the interrupt handler itself, because the system is in an unknown state. The solution to this problem is for the interrupt handler to do what needs to be done immediately, usually read something from the hardware or send something to the hardware, and then schedule the handling of the new information at a later time (this is called the "bottom half") and return. The kernel is then guaranteed to call the bottom half as soon as possible -- and when it does, everything allowed in kernel modules will be allowed.
I think when the interrupt is called whatever is working is stopped and the variable is set to TRUE, not satisfying the while condition.
But when you use volatile keyword it makes C checks the variable value again.
Of course, I'm not 100% sure about this, I'm open for answers for change mine.

interrupt is not a C specified keyword, so whatever is discussed is not C specified behavior.
Yes the compiler could see that etx_rcvd is modified inside an interrupt routine and therefore assume etx_rcvd could change at any time outside the interrupt function and make int etx_rcvd --> volatile int etx_rcvd.
Now the question is should it do that?
IMO: No, it is not needed.
An interrupt function could modify global variables and the code flow is such that non-interrupt functions access only happens in a interrupt protected block. An optimizing compiler would be hindered by having implied volatile with int etx_rcvd. So now code needs a way to say non_volatile int etx_rcvd to prevent the volatile assumption OP seeks.
C all ready provides a method to declare variables volatile (add volatile) and non-volatile (do not add volatile). If an interrupt routine could make variables volatile without them being declared so, the code would need a new keyword to insure non-volatility.

Use printk in kernel

I am trying to implement my own new schedule(). I want to debug my code.
Can I use printk function in sched.c?
I used printk but it doesn't work. What did I miss?

Do you know how often schedule() is called? It's probably called faster than your computer can flush the print buffer to the log. I would suggest using another method of debugging. For instance running your kernel in QEMU and using remote GDB by loading the kernel.syms file as a symbol table and setting a breakpoint. Other virtualization software offers similar features. Or do it the manual way and walk through your code. Using printk in interrupt handlers is typically a bad idea (unless you're about to panic or stall).
If the error you are seeing doesn't happen often think of using BUG() or BUG_ON(cond) instead. These do conditional error messages and shouldn't happen as often as a non-conditional printk
Editing the schedule() function itself is typically a bad idea (unless you want to support multiple run queue's etc...). It's much better and easier to instead modify a scheduler class. Look at the code of the CFS scheduler to do this. If you want to accomplish something else I can give better advice.

It's not safe to call printk while holding the runqueue lock. A special function printk_sched was introduced in order to have a mechanism to use printk when holding the runqueue lock (https://lkml.org/lkml/2012/3/13/13). Unfortunatly it can just print one message within a tick (and there cannot be more than one tick when holding the run queue lock because interrupts are disabled). This is because an internal buffer is used to save the message.
You can either use lttng2 for logging to user space or patch the implementation of printk_sched to use a statically allocated pool of buffers that can be used within a tick.

Try trace_printk().
printk() has too much of an overhead and schedule() gets called again before previous printk() calls finish. This creates a live lock.
Here is a good article about it: https://lwn.net/Articles/365835/

It depends, basically it should be work fine.
try to use dmesg in shell to trace your printk if it is not there you apparently didn't invoked it.
2396 if (p->mm && printk_ratelimit()) {
2397 printk(KERN_INFO "process %d (%s) no longer affine to cpu%d\n",
2398 task_pid_nr(p), p->comm, cpu);
2399 }
2400
2401 return dest_cpu;
2402 }
there is a sections in sched.c that printk doesn't work e.g.
1660 static int double_lock_balance(struct rq *this_rq, struct rq *busiest)
1661 {
1662 if (unlikely(!irqs_disabled())) {
1663 /* printk() doesn't work good under rq->lock */
1664 raw_spin_unlock(&this_rq->lock);
1665 BUG_ON(1);
1666 }
1667
1668 return _double_lock_balance(this_rq, busiest);
1669 }
EDIT
you may try to printk once in 1000 times instead of each time.

Difference between request_irq and __interrupt

From what I read both are used to register interrupt handlers. I saw lots of request_irq calls in kernel code but not even one __interrupt call. Is __interrupt some way to register a handler from user space?

request_irq is essentially a wrapper call to request_threaded_irq, which allocates the IRQ resources and enables the IRQ. That's paraphrased from the comment block in kernel/irq/manage.c, Line #1239.
Basically, you want to use request_irq if you need to setup interrupt handling for a device of some kind. Make sure that whatever subsystem you are working in doesn't already provide a wrapper for request_irq, too. I.e., if you are working on a device driver, consider using the devm_* family of calls to auto-manage the minutiae, like freeing unused variables and such. See devm_request_threaded_irq at Line #29 in kernel/irq/devres.c for a better explanation. Its equivalent call (and the one you would most likely use) is devm_request_irq.

As far as I remember __interrupt() is used to declare a function as ISR in userspace. I am not sure where I have this from but I'll come back to you as soon as I found the spot.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight