How to find signal handlers definitions in Linux kernel? - c

I am currrently working on "Creation of Postmortem data logger in Linux on Intel architecture".
Its nothing but core utility creation.
Can any body share the details about how the signal handlers for various signals(SIGSEGV,SIGABRT,SIGFPE etc) which produce core dump upon crashing an application internally implemented in Linux kernel. I need to re-write these signal handlers with my own user specific needs and rebuild the kernel. It makes my kernel producing the core file (upon crashing an application) with user specific needs like showing registers,stackdump and backtrace etc.
Can anybody share the details about it....
Advance thanks to all the repliers:)

You may not need to modify the kernel at all - the kernel supports invoking a userspace application when a core dump occurs. From the core(5) man page:
Since kernel 2.6.19, Linux supports an
alternate syntax for the
/proc/sys/kernel/core_pattern file.
If the first character of this file is
a pipe symbol (|), then the
remainder of the line is interpreted
as a program to be executed. Instead
of being written to a disk file, the
core dump is given as standard input
to the program.

The actual dumping code depends on the format of the dump. For ELF format, look at the fs/binfmt_elf.c file. I has an elf_dump_core function. (Same with other formats.)
This is triggered by get_signal_to_deliver in kernel/signal.c, which calls into do_coredump in fs/exec.c, which calls the handler.

LXR, the Linux Cross-Reference, is usually helpful when you want to know how something is done in the Linux kernel. It's a browsing and searching tool for the kernel sources.
Searching “core dump” returns a lot of hits, but two of the most promising-looking are in fs/exec.c and fs/proc/kcore.c (promising because the file names are fairly generic, in particular you don't want to start with architecture-specific stuff). kcore.c is actually for a kernel core dump, but the hit in fs/exec.c is in the function do_coredump, which is the main function for dumping a process's core. From there, you can both read the function to see what it does, and search to see where it's called.
Most of the code in do_coredump is about determining whether to dump core and where the dump should go. What to dump is handled near the end: binfmt->core_dump(&cprm), i.e. this is dependent on the executable format (ELF, a.out, …). So your next search is on the core_dump struct field, specifically its “usage”; then select the hit corresponding to an executable format. ELF is probably the one you want, and so you get to the elf_core_dump function.
That being said, I'm not convinced from your description of your goals that what you want is really to change the core dump format, as opposed to writing a tool that analyses existing dumps.
You may be interested in existing work on analyzing kernel crash dumps. Some of that work is relevant to process dumps as well, for example the gcore extension to include process dumps in kernel crash dumps.

Related

Calling system calls from the kernel code

I am trying to create a mechanism to read performance counters for processes. I want this mechanism to be executed from within the kernel (version 4.19.2) itself.
I am able to do it from the user space the sys_perf_event_open() system call as follows.
syscall (__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags);
I would like to invoke this call from the kernel space. I got some basic idea from here How do I use a Linux System call from a Linux Kernel Module
Here are the steps I took to achieve this:
To make sure that the virtual address of the kernel remains valid, I have used set_fs(), get_fs() and get_fd().
Since sys_perf_event_open() is defined in /include/linux/syscalls.h I have included that in the code.
Eventually, the code for calling the systems call looks something like this:
mm_segment_t fs;
fs = get_fs();
set_fs(get_ds());
long ret = sys_perf_event_open(&pe, pid, cpu, group_fd, flags);
set_fs(fs);
Even after these measures, I get an error claiming "implicit declaration of function ‘sys_perf_event_open’ ". Why is this popping up when the header file defining it is included already? Does it have to something with the way one should call system calls from within the kernel code?
In general (not specific to Linux) the work done for systems calls can be split into 3 categories:
switching from user context to kernel context (and back again on the return path). This includes things like changing the processor's privilege level, messing with gs, fiddling with stacks, and doing security mitigations (e.g. for Meltdown). These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
using a "function number" parameter to find the right function to call, and calling it. This typically includes some sanity checks (does the function exist?) and a table lookup, plus code to mangle input and output parameters that's needed because the calling conventions used for system calls (in user space) is not the same as the calling convention that normal C functions use. These things are expensive, and if you're already in the kernel they're useless and/or dangerous.
the final normal C function that ends up being called. This is the function that you might have (see note) been able to call directly without using any of the expensive, useless and/or dangerous system call junk.
Note: If you aren't able to call the final normal C function directly without using (any part of) the system call junk (e.g. if the final normal C function isn't exposed to other kernel code); then you must determine why. For example, maybe it's not exposed because it alters user-space state, and calling it from kernel will corrupt user-space state, so it's not exposed/exported to other kernel code so that nobody accidentally breaks everything. For another example, maybe there's no reason why it's not exposed to other kernel code and you can just modify its source code so that it is exposed/exported.
Calling system calls from inside the kernel using the sys_* interface is discouraged for the reasons that others have already mentioned. In the particular case of x86_64 (which I guess it is your architecture) and starting from kernel versions v4.17 it is now a hard requirement not to use such interface (but for a few exceptions). It was possible to invoke system calls directly prior to this version but now the error you are seeing pops up (that's why there are plenty of tutorials on the web using sys_*). The proposed alternative in the Linux documentation is to define a wrapper between the syscall and the actual syscall's code that can be called within the kernel as any other function:
int perf_event_open_wrapper(...) {
// actual perf_event_open() code
}
SYSCALL_DEFINE5(perf_event_open, ...) {
return perf_event_open_wrapper(...);
}
source: https://www.kernel.org/doc/html/v4.19/process/adding-syscalls.html#do-not-call-system-calls-in-the-kernel
Which kernel version are we talking about?
Anyhow, you could either get the address of the sys_call_table by looking at the System map file, or if it is exported, you can look up the symbol (Have a look at kallsyms.h), once you have the address to the syscall table, you may treat it as a void pointer array (void **), and find your desired functions indexed. i.e sys_call_table[__NR_open] would be open's address, so you could store it in a void pointer and then call it.
Edit: What are you trying to do, and why can't you do it without calling syscalls? You must understand that syscalls are the kernel's API to the userland, and should not be really used from inside the kernel, thus such practice should be avoided.
calling system calls from kernel code
(I am mostly answering to that title; to summarize: it is forbidden to even think of that)
I don't understand your actual problem (I feel you need to explain it more in your question which is unclear and lacks a lot of useful motivation and context). But a general advice -following the Unix philosophy- is to minimize the size and vulnerability area of your kernel or kernel module code, and to deport, as much as convenient, such code in user-land, in particular with the help of systemd, as soon as your kernel code requires some system calls. Your question is by itself a violation of most Unix and Linux cultural norms.
Have you considered to use efficient kernel to user-land communication, in particular netlink(7) with socket(7). Perhaps you also
want some driver specific kernel thread.
My intuition would be that (in some user-land daemon started from systemd early at boot time) AF_NETLINK with socket(2) is exactly fit for your (unexplained) needs. And eventd(2) might also be relevant.
But just thinking of using system calls from inside the kernel triggers a huge flashing red light in my brain and I tend to believe it is a symptom of a major misunderstanding of operating system kernels in general. Please take time to read Operating Systems: Three Easy Pieces to understand OS philosophy.

What and where exactly is the loader?

I understand every bit of the C compilation process (how the object files are linked to create the executable). But about the loader itself (which starts the program running) I have a few doubts.
Is the loader part of the kernel?
How exactly is the ./firefox or some command like that loaded? I mean you normally type such commands into the terminal which loads the executable I presume. So is the loader a component of the shell?
I think I'm also confused about where the terminal/shell fits into all of this and what its role is.
The format of an executable determines how it will be loaded. For example executables with "#!" as the first two characters are loaded by the kernel by executing the named interpreter and feeding the file to it as the first argument. If the executable is formatted as a PE, ELF, or MachO binary then the kernel uses an intrepter for that format that is built in to the kernel in order to find the executable code and data and then choose the next step.
In the case of a dynamically linked ELF, the next step is to execute the dynamic loader (usually ld.so) in order to find the libraries, load them, abd resolve the symbols. This all happens in userspace. The kernel is more or less unaware of dynamic linking, because it all happens in userspace after the kernel has handed control to the interprter named in the ELF file.
The corresponding system call is exec. It is part of the kernel and in charge of cleaning the old address space that makes the call and get a new fresh one with all materials to run a new code. This is part of the kernel because address space is a kind of sandbox that protect processes from others, and since it is critical it is in charge of the kernel.
The shell is just in charge of interpreting what you type and transform it to proper structures (list or arrays of C-strings) to pass to some exec call (after having, most of the time, spawned a new process with fork).

Get user stackpointer from task_struct

I have kcore and I want to get userspace backtrace from kcore. Because some one from our application is making lot of munmap and making the system hang(CPU soft lockup 22s!). I looked at some macro but still this is just giving me kernel backtrace only. What I want is userspace backtrace.
Good news is I have pointer to task_struct.
task_struct->thread->sp (Kernel stack pointer)
task_struct->thread->usersp (user stack pointer) but this is junk
My question is how to get userspace backtrace from kcore or task_struct.
First of all, vmcore is a immediate full memory snapshot, so it contains all pages (including user pages). But if the user pages are swapped out, they couldn't be accessed. So that is why kdump (and similar tools as your gdb python script) focused on kernel debugging functionality only. For userspace debugging and stacktraces you have to use coredump functionality. By default the coredumps are produced when kernel sends (for example) SIGSEGV to your app, but you can make them when you want by using gcore of modifying kernel. Also there is a "userspace" way of making process dump, see google coredumper project
Also, you can try to unwind user stacktrace directly from kcore - but this is a tricky way, and you will have to hope that userspace stacktrace is not swapped out at the moment. (do you use a swap?) You can see __save_stack_trace_user, it will make sense of how to retrieve userspace context
First of all vmcores typically don't contain user pages. I'm unaware of any magic which would help here - you would have to inspect vm mappings for given task address space and then inspect physical pages, and I highly doubt the debugger knows how to do it.
But most importantly you likely don't have any valid reason to do it in the first place.
So, what are you trying to achieve?
=======================
Given the edit:
some one from our application is making lot of munmap and making the
system hang(CPU soft lockup 22s!).
There may or may not be an actual kernel issue which must be debugged. I don't see any use for userspace stacktraces for this one though.
So as I understand presumed issue is excessive mmap + munmap calls from the application.Inspecting the backtrace of the thread reported with said lockup may or may not happen to catch the culprit. What you really want is to collect backtraces of /all/ callers and sort them by frequency. This can be done (albeit with pain) with systemtap.

Where can I find system call source code?

In Linux where can I find the source code for all system calls given that I have the source tree? Also if I were to want to look up the source code and assembly for a particular system call is there something that I can type in terminal like my_system_call?
You'll need the Linux kernel sources in order to see the actual source of the system calls. Manual pages, if installed on your local system, only contain the documentation of the calls and not their source itself.
Unfortunately for you, system calls aren't stored in just one particular location in the whole kernel tree. This is because various system calls can refer to different parts of the system (process management, filesystem management, etc.) and therefore it would be infeasible to store them apart from the part of the tree related to that particular part of the system.
The best thing you can do is look for the SYSCALL_DEFINE[0-6] macro. It is used (obviously) to define the given block of code as a system call. For example, fs/ioctl.c has the following code :
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
/* do freaky ioctl stuff */
}
Such a definition means that the ioctl syscall is declared and takes three arguments. The number next to the SYSCALL_DEFINE means the number of arguments. For example, in the case of getpid(void), declared in kernel/timer.c, we have the following code :
SYSCALL_DEFINE0(getpid)
{
return task_tgid_vnr(current);
}
Hope that clears things up a little.
From an application's point of view, a system call is an elementary and atomic operation done by the kernel.
The Assembly Howto explains what is happening, in terms of machine instruction.
Of course, the kernel is doing a lot of things when handling a syscall.
Actually, you almost could believe that the entire kernel code is devoted to handle all system calls (this is not entirely true, but almost; from applications' point of view, the kernel is only visible thru system calls). The other answer by Daniel Kamil Kozar is explaining what kernel function is starting the handling of some system call (but very often, many other parts of the kernel indirectly participate to system calls; for example, the scheduler participates indirectly into implementing fork because it manages the child process created by a successful fork syscall).
I know it's old, but I was searching for the source for _system_call() too and found this tidbit
Actual code for system_call entry point can be found in /usr/src/linux/kernel/sys_call.S Actual code for many of the system calls can be found in /usr/src/linux/kernel/sys.c, and the rest are found elsewhere. find is your friend.
I assume this is dated, because I don't even have that file. However, grep found ENTRY(system_call) in arch/x86/kernel/entry_64.S and seems to be the thing that calls the individual system calls. I'm not up on my intel-syntax x86 asm right now, so you'll have to look and see if this is what you wanted.

What does a core dump of a C code mean ?

What does the extension of the core dump mean and how to read core dump file? As in when I open the file in text editors, I get garbage values.
Note : Its extension is something like .2369
You can use gdb to read the core dump. The extension is the process id.
Here is a link to a thread explaining how to do this.
And here is a gdb tutorial.
A core file is the memory image of a process at that point in time when it was terminated. Termination could for example happen through a segmentation fault or a failed assert. To "view" a coredump you will need a debugger. It will allow you to examine the state of the process. This includes listing the stack traces for all the threads of the process. Printing the values of variables and registers. Note that this works "better" if you have debug information available.
Traditionally core files are just named "core". This has the not so nice effect that cores will overwrite them selfs before a developer/admin discovers them. Many modern platforms allow to give core-files custom names that contain additional information. The number at the end of your core could for example be the PID of the process that this core belonged to.
The extension is most often the process ID that crashed. You need to examine the file with a debug tool.
Wikipedia can explain core dumps better than I, but
It is the dump of "core" memory; that is, the memory, registers, and other program state that the process holds when it crashes.
The value at the end of the filename must be system dependent. I normally use a debugger like GDB, in concert with my program to examine such files.

Resources