I want to add some code to my Linux 3.10 kernel that will run and use the alloc_bootmem(unsigned long size) function to allocate memory.
I understood that alloc_bootmem(unsigned long size) runs only in the boot stage, so I need my code to run in booting.
The problem is that I don't know how to make my function be called when the computer is booting.
I'm searching for a main function in Linux kernel that runs in the boot stage and calls all different of functions that also need to work in the boot stage. I want to add a line to this function that calls my code.
Can anyone show me this kind of function?
Is this the best way to add code that uses alloc_bootmem in the booting stage to the Linux kernel?
Thanks For Help!
The start_kernel function in init/main.c is kernel entry point. There some functions call alloc_bootmem (like setup_command_line).
Related
I am new to linux kernel programming. I am developing a simple Loadable Kernel
Module which needs info whenever there is a change in scheduler runqueue
(say rq_rt only). So I need to send a signal or interrupt to my kernel module (say a interrupt or signal handler in my module ) from the scheduler's functions (enqueue_rt, dequeue_rt, current_premept etc....).
Can anyone suggest a method how to send such signals or interrupts?
Yes. Finally I got the solution. We can make use of kernel tracing mechanism, ftrace.
This doesn't require any kernel modifications, but we can hook to kernel functions which are not trace protected.
More details are available here
Also for an efficient solution, you can make use of function pointers from kernel source code.
But the problem here is you have to modify Linux source code. Be very careful when you do such modifications.
Implement a NULL function pointer from you Linux source code - Call to this function pointer from the kernel routine (check for NULL and do)
Export the symbol
Provide a local function address to this symbol from your loadable module.
That's it...!!! You will get a function call from the kernel routine. Also make sure that when you exit the module, put the symbol back to NULL, otherwise kernel will crash.
I am trying to create a mechanism to read performance counters for processes. I want this mechanism to be executed from within the kernel (version 4.19.2) itself.
I am able to do it from the user space the sys_perf_event_open() system call as follows.
syscall (__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
I would like to invoke this call from the kernel space. I got some basic idea from here How do I use a Linux System call from a Linux Kernel Module
Here are the steps I took to achieve this:
To make sure that the virtual address of the kernel remains valid, I have used set_fs(), get_fs() and get_fd().
Since sys_perf_event_open() is defined in /include/linux/syscalls.h I have included that in the code.
Eventually, the code for calling the systems call looks something like this:
mm_segment_t fs;
fs = get_fs();
set_fs(get_ds());
long ret = sys_perf_event_open(&pe, pid, cpu, group_fd, flags);
set_fs(fs);
Even after these measures, I get an error claiming "implicit declaration of function ‘sys_perf_event_open’ ". Why is this popping up when the header file defining it is included already? Does it have to something with the way one should call system calls from within the kernel code?
In general (not specific to Linux) the work done for systems calls can be split into 3 categories:
switching from user context to kernel context (and back again on the return path). This includes things like changing the processor's privilege level, messing with gs, fiddling with stacks, and doing security mitigations (e.g. for Meltdown). These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
using a "function number" parameter to find the right function to call, and calling it. This typically includes some sanity checks (does the function exist?) and a table lookup, plus code to mangle input and output parameters that's needed because the calling conventions used for system calls (in user space) is not the same as the calling convention that normal C functions use. These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
the final normal C function that ends up being called. This is the function that you might have (see note) been able to call directly without using any of the expensive, useless and/or dangerous system call junk.
Note: If you aren't able to call the final normal C function directly without using (any part of) the system call junk (e.g. if the final normal C function isn't exposed to other kernel code); then you must determine why. For example, maybe it's not exposed because it alters user-space state, and calling it from kernel will corrupt user-space state, so it's not exposed/exported to other kernel code so that nobody accidentally breaks everything. For another example, maybe there's no reason why it's not exposed to other kernel code and you can just modify its source code so that it is exposed/exported.
Calling system calls from inside the kernel using the sys_* interface is discouraged for the reasons that others have already mentioned. In the particular case of x86_64 (which I guess it is your architecture) and starting from kernel versions v4.17 it is now a hard requirement not to use such interface (but for a few exceptions). It was possible to invoke system calls directly prior to this version but now the error you are seeing pops up (that's why there are plenty of tutorials on the web using sys_*). The proposed alternative in the Linux documentation is to define a wrapper between the syscall and the actual syscall's code that can be called within the kernel as any other function:
int perf_event_open_wrapper(...) {
// actual perf_event_open() code
}
SYSCALL_DEFINE5(perf_event_open, ...) {
return perf_event_open_wrapper(...);
}
source: https://www.kernel.org/doc/html/v4.19/process/adding-syscalls.html#do-not-call-system-calls-in-the-kernel
Which kernel version are we talking about?
Anyhow, you could either get the address of the sys_call_table by looking at the System map file, or if it is exported, you can look up the symbol (Have a look at kallsyms.h), once you have the address to the syscall table, you may treat it as a void pointer array (void **), and find your desired functions indexed. i.e sys_call_table[__NR_open] would be open's address, so you could store it in a void pointer and then call it.
Edit: What are you trying to do, and why can't you do it without calling syscalls? You must understand that syscalls are the kernel's API to the userland, and should not be really used from inside the kernel, thus such practice should be avoided.
calling system calls from kernel code
(I am mostly answering to that title; to summarize: it is forbidden to even think of that)
I don't understand your actual problem (I feel you need to explain it more in your question which is unclear and lacks a lot of useful motivation and context). But a general advice -following the Unix philosophy- is to minimize the size and vulnerability area of your kernel or kernel module code, and to deport, as much as convenient, such code in user-land, in particular with the help of systemd, as soon as your kernel code requires some system calls. Your question is by itself a violation of most Unix and Linux cultural norms.
Have you considered to use efficient kernel to user-land communication, in particular netlink(7) with socket(7). Perhaps you also
want some driver specific kernel thread.
My intuition would be that (in some user-land daemon started from systemd early at boot time) AF_NETLINK with socket(2) is exactly fit for your (unexplained) needs. And eventd(2) might also be relevant.
But just thinking of using system calls from inside the kernel triggers a huge flashing red light in my brain and I tend to believe it is a symptom of a major misunderstanding of operating system kernels in general. Please take time to read Operating Systems: Three Easy Pieces to understand OS philosophy.
Is it possible to override one of the linux kernel functions using LD_PRELOAD?
For instance, I want to change the cookie_hash function in Linux/net/ipv4/syncookie.c for the listening socket for my program fooserver. Can I do it using LD_PRELOAD, or I need to recompile the kernel for that?
Are there any other options?
Thanks,
No, it is not possible to use LD_PRELOAD to replace a function in the kernel.
You will need to either recompile the kernel.
If the function is in a kernel module, then you may be able to unload, recompile and reload the module without needing to restart the kernel.
If this is something you will be doing frequently, then you will want to use a second computer, or a virtual machine, so you won't have to keep restarting the computer you're programming on.
You have to use kprobes or systemtap to override kernel functions. It isn't necessary to recompile.
You can do something similar in Linux Kernel.
It isn't a trivial operation but what you should do is the next:
Find the address of the function you want to be replaced. There are several ways to achieve the address. The simplest one is 'cat /proc/kallsyms | grep cookie_hash ".
From your module, you save the content of the address. It is the original 'cookie_hash' function.
Into this address, you place the address of your function 'my_cookie_hash'.
At the end of your function 'my_cookie_hash', you call the original function 'cookie_hash'.
There are many hidden traps and potential crashes, though.
But generally, this approach works.
Is there a way to find the parameters passes to the kernel (/proc/cmdline) at boot time without reading any files in proc? I might have a process that needs to read the boot parameters before /proc is mounted.
It seems the kernel passes anything on the boot line as a arg to init- unless there is a ' = ' or a ' . '
Passing Linux boot opts to Init
I'm sure there's a better way, but I do see it in dmesg
I'd like to understand your requirements better, because you are pretty much guaranteed to have /proc mounted. Mounting /proc is one of the very first things that init does.
Ubuntu with upstart: /etc/init/mountall.conf, part of the startup event
Fedora with initscripts: /etc/rc.sysinit, second thing it does (after setting the hostname)
Your code is almost certain to run after this.
Seeing as you're replacing init, take a look at how init does it. init git repo.
There seems to be a global symbol called boot_command_line.
After the boot loader hands execution over to the kernel, what happens? I know assembler, so what are the first few instructions that a kernel must make? Or is there a C function that does this? What is the startup sequence before the kernel can execute an arbitrary binary?
I'll assume that you're talking about x86 here...
It depends where you consider the boundary between "boot loader" and "kernel" to be: the start of the kernel proper is 32-bit protected mode code, but the kernel itself provides some boot code to get there from real mode.
The real mode code is in arch/x86/boot/: start_of_setup does some basic setup of the environment for C, and calls main(), which does some fairly dull stuff, ending with the actual jump to protected mode (see pmjump.S).
Where you end up now depends on whether or not the kernel is compressed. If it is, the entry point is actually a self-decompression routine. This is fairly dull stuff as well, and essentially transparent: the decompression code and compressed kernel are moved higher up in memory out of the way, then the kernel is uncompressed to the original location, and then jumped into as if it had been uncompressed all along. This code is in arch/x86/boot/compressed/ (the entry point is startup_32 in head_32.S).
The kernel really gets going properly at startup_32 in arch/x86/kernel/head_32.S. The code there ends up by calling i386_start_kernel() in arch/x86/kernel/head32.c, which finally calls the generic kernel startup code in start_kernel().
It's asmlinkage void __init start_kernel(void) C function in init/main.c.