So I am trying to find out what kernel processes are calling some functions in a block driver. I thought including backtrace() in the C library would make it easy. But I am having trouble to load the backtrace.
I copied this example function to show the backtrace:
http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/063/6391/6391l1.html
All attempts to compile have error in one place or another that a file cannot be found or that the functions are not defined.
Here is what comes closest.
In the Makefile I put the compiler directives:
-rdynamic -I/usr/include
If I leave out the second one, -I/usr/include, then the compiler reports it cannot find the required header execinfo.h.
Next, in the code where I want to do the backtrace I have copied the function from the example:
//trying to include the c backtrace capability
#include <execinfo.h>
void show_stackframe() {
void *trace[16];
char **messages = (char **)NULL;
int i, trace_size = 0;
trace_size = backtrace(trace, 16);
messages = backtrace_symbols(trace, trace_size);
printk(KERN_ERR "[bt] Execution path:\n");
for (i=0; i<trace_size; ++i)
printk(KERN_ERR "[bt] %s\n", messages[i]);
}
//backtrace function
I have put the call to this function later on, in a block driver function where the first sign of the error happens. Simply:
show_stackframe();
So when I compile it, the following errors:
user#slinux:~/2.6-32$ make -s
Invoking make againt the kernel at /lib/modules/2.6.32-5-686/build
In file included from /usr/include/features.h:346,
from /usr/include/execinfo.h:22,
from /home/linux/2.6-32/block/block26.c:49:
/usr/include/sys/cdefs.h:287:1: warning: "__always_inline" redefined
In file included from /usr/src/linux-headers-2.6.32-5-common/include/linux/compiler-gcc.h:86,
from /usr/src/linux-headers-2.6.32-5-common/include/linux/compiler.h:40,
from /usr/src/linux-headers-2.6.32-5-common/include/linux/stddef.h:4,
from /usr/src/linux-headers-2.6.32-5-common/include/linux/list.h:4,
from /usr/src/linux-headers-2.6.32-5-common/include/linux/module.h:9,
from /home/linux/2.6-32/inc/linux_ver.h:40,
from /home/linux/2.6-32/block/block26.c:32:
/usr/src/linux-headers-2.6.32-5-common/include/linux/compiler-gcc4.h:15:1: warning: this is the location of the previous definition
/home/linux/2.6-32/block/block26.c:50: warning: function declaration isn’t a prototype
WARNING: "backtrace" [/home/linux/2.6-32/ndas_block.ko] undefined!
WARNING: "backtrace_symbols" [/home/linux/2.6-32/ndas_block.ko] undefined!
Note: block26.c is the file I am hoping to get the backtrace from.
Is there an obvious reason why the backtrace and backtrace_symbols remain undefined when it is compiled into the .ko modules?
I am guessing it because I use the compiler include execinfo.h which is residing on the computer and not being loaded to the module.
It is my uneducated guess to say the least.
Can anyone offer a help to get the backtrace functions loading up in the module?
Thanks for looking at this inquiry.
I am working on debian. When I take out the function and such, the module compiles fine and almost works perfectly.
From ndasusers
To print the stack contents and a backtrace to the kernel log, use the dump_stack() function in your kernel module. It's declared in linux/kernel.h in the include folder in the kernel source directory.
If you need to save the stack trace and process its elements somehow, save_stack_trace() or dump_trace() might be also an option. These functions are declared in <linux/stacktrace.h> and <asm/stacktrace.h>, respectively.
It is not as easy to use these as dump_stack() but if you need more flexibility, they may be helpful.
Here is how save_stack_trace() can be used (replace HOW_MANY_ENTRIES_TO_STORE with the value that suits your needs, 16-32 is usually more than enough):
unsigned long stack_entries[HOW_MANY_ENTRIES_TO_STORE];
struct stack_trace trace = {
.nr_entries = 0,
.entries = &stack_entries[0],
.max_entries = HOW_MANY_ENTRIES_TO_STORE,
/* How many "lower entries" to skip. */
.skip = 0
};
save_stack_trace(&trace);
Now stack_entries array contains the appropriate call addresses. The number of elements filled is nr_entries.
One more thing to point out. If it is desirable not to output the stack entries that belong to the implementation of save_stack_trace(), dump_trace() or dump_stack() themselves (on different systems, the number of such entries may vary), the following trick can be applied if you use save_stack_trace(). You can use __builtin_return_address(0) as an "anchor" entry and process only the entries "not lower" than that.
I know this question is about Linux, but since it's the first result for "backtrace kernel", here's a few more solutions:
DragonFly BSD
It's print_backtrace(int count) from /sys/sys/systm.h. It's implemented in
/sys/kern/kern_debug.c and/or /sys/platform/pc64/x86_64/db_trace.c. It can be found by searching for panic, which is implemented in /sys/kern/kern_shutdown.c, and calls print_backtrace(6) if DDB is defined and trace_on_panic is set, which are both defaults.
FreeBSD
It's kdb_backtrace(void) from /sys/sys/kdb.h. Likewise, it's easy to find by looking into what the panic implementation calls when trace_on_panic is true.
OpenBSD
Going the panic route, it appears to be db_stack_dump(), implemented in /sys/ddb/db_output.c. The only header mention is /sys/ddb/db_output.h.
dump_stack() is function can be used to print your stack and thus can be used to backtrack . while using it be carefull that don't put it in repetitive path like loops or packet receive function it can fill your dmesg buffer can cause crash in embedded device (having less memory and cpu).
This function is declared in linux/kernel.h .
Related
From the signal.h manpage, the prototype(s) for rt_sigprocmask are as follows:
/* Prototype for the glibc wrapper function */
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
/* Prototype for the underlying system call */
int rt_sigprocmask(int how, const kernel_sigset_t *set,
kernel_sigset_t *oldset, size_t sigsetsize);
Seeing as kernel_sigset_t is in the prototype for rt_sigprocmask, I would presume that the definition for this type would be included in signal.h. But I am getting the error that kernel_sigset_t is undeclared when I am trying to use it in my program.
I wrote a small simple program to demonstrate the error:
#include <stdio.h>
#include <signal.h>
int main()
{
printf("%d\n", sizeof(kernel_sigset_t));
return 0;
}
Which gives this message when I compile:
>gcc -o tmp tmp.c
tmp.c: In function ‘main’:
tmp.c:5:24: error: ‘kernel_sigset_t’ undeclared (first use in this function)
5 | printf("%d\n", sizeof(kernel_sigset_t));
| ^~~~~~~~~~~~~~~
tmp.c:5:24: note: each undeclared identifier is reported only once for each function it appears in
Why is this happening?
Am I including the wrong thing, or what?
EDIT: FURTHER INFORMATION
The reason I am asking this question is because I am making a program that traces two identical programs running in parallel, and compares the arguments of every system call to check that they are equal.
In order to do this, I need to check that system call arguments that are pointers point to the same data in both of the traced programs.
So, with the rt_sigprocmask system call, I want to check that the kernel_sigset_t pointers set and oldset both point to the same data. I would do this by comparing sizeof(kernel_sigset_t) length of data at the addresses pointed to by these pointers, and see if they're the same (using process_vm_readv).
However, as kernel_sigset_t, is seemingly not defined, I don't know how to do this. As the manpage says, the kernel's sigset_t and the userspace one are different sizes: how am I supposed to know what is the correct size to compare? If I just use sigset_t, will it be correct, if the kernel one is different?
As a general rule, it's not always enough to include the header file, sometimes these definitions are buried deep within #idfef hierarchies.
For example, examining the Linux doco for the sigprocmask call, you can see the required feature test macros:
_POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE
meaning that one of those has to be true in order to get sigprocmask.
However, in this particular case, the manpage shows the most likely problem. In the section detailing differences between glibc (user-space) and kernel calls, we see (emphasis added):
The kernel's definition of sigset_t differs in size from that used by the C library. In this manual page, the former is referred to as kernel_sigset_t (it is nevertheless named sigset_t in the kernel sources).
In other words, it has a different name to that shown in the documented prototype and, unfortunately, the same name as the one defined in user-space.
This is likely to lead to all sorts of issues if you mix user-space and kernel methods - my advice would be to just use the user-space ones if possible, and leave the kernel ones to, well, kernel developers :-)
From a cursory investigation, the rt variant of the system call (this is not the user-space function) was added to cater for a larger signal set - once real-time signals were added, the signal bit-mask grew beyond 32 bits so the signal set had to be expanded.
The user-space function will intelligently call the correct system call under the covers, and the older system call is still there but deprecated. That function call will also silently prevent you from fiddling with the signals used by NPTL, the native POSIX threading library.
In respect to the updates to your question, in which you state you want to track system calls to ensure the arguments passed in are the same, you don't actually need to know the size of that structure.
The way rt_sigprocmask works is that the length of the structure is actually one of the arguments, sigsetsize. So that's the size you should be using for comparison.
I have the following structure:
struct sys_config_s
{
char server_addr[256];
char listen_port[100];
char server_port[100];
char logfile[PATH_MAX];
char pidfile[PATH_MAX];
char libfile[PATH_MAX];
int debug_flag;
unsigned long connect_delay;
};
typedef struct sys_config_s sys_config_t;
I also have a function defined in a static library (let's call it A.lib):
sys_config_t* sys_get_config(void)
{
static sys_config_t config;
return &config;
}
I then have a program (let's call it B) and a dynamic library (let's call it C). Both B and C link with A.lib. At runtime B opens C via dlopen() and then gets an address to C's function func() via a call to dlsym().
void func(void)
{
sys_get_config()->connect_delay = 1000;
}
The above code is the body of C's func() function and it produces a segmentation fault when reached. The segfault only occurs while running outside of gdb.
Why does that happen?
EDIT: Making sys_config_t config a global variable doesn't help.
The solution is trivial. Somehow, by a header mismatch, the PATH_MAX constant was defined differently in B's and C's compilation units. I need to be more careful in the future. (facepalms)
There is no difference between the variable being a static-local, or a static-global variable. A static variable is STATIC, that means, it is not, on function-call demand, allocated on the stack within the current function frame, but rather it is allocated in one of the preexisting segments of the memory defined in the executable's binary headers.
That's what I'm 100% sure. The question, where in what segment they exactly placed, and whether they are properly shared - is an another problem. I've seen similar problems with sharing global/static variables between modules, but usually, the core of the problem was very specific to the exact setup..
Please take into consideration, that the code sample is small, and I worked on that platforms long time ago. What I've written above might got mis-worded or even be plainly wrong at some points!
I think, that the important thing is that you are getting that segfault in C when touching that line. Setting an integer field to a constant could not have failed, never, provided that target address is valid and not write-protected. That leaves two options:
- either your function sys_get_config() has crashed
- or it has returned an invalid pointer.
Since you say that the segfault is raised here, not in sys_get_config, the only thing left is the latter point: broken pointer.
Add to the sys_get_config some trivial printf that will dump the address-to-be-returned, then do the same in the calling function "func". Check whether it not-null, and also check if within sys_get_config it is the same as after being returned, just to be sure that calling conventions are proper, etc. A good idea for making a double/triple check is to also add inside the module "A" a copy of the function sys_get_config (with different name of course), and to check whether the addresses returned from sys_get_config and it's copy are the same. If they are not - something went very wrong during the linking
There is also a very very small chance that the module loading has been deferred, and you are trying to reference a memory of a module that was not fully initialized yet.. I worked on linux very long time ago, but I remember that dlopen has various loading options. But you wrote that you got the address by dlsym, so I suppose the module has loaded since you've got the symbol's final address..
Is it possible to avoid the entry point (main) in a C program. In the below code, is it possible to invoke the func() call without calling via main() in the below program ? If Yes, how to do it and when would it be required and why is such a provision given ?
int func(void)
{
printf("This is func \n");
return 0;
}
int main(void)
{
printf("This is main \n");
return 0;
}
If you're using gcc, I found a thread that said you can use the -e command-line parameter to specify a different entry point; so you could use func as your entry point, which would leave main unused.
Note that this doesn't actually let you call another routine instead of main. Instead, it lets you call another routine instead of _start, which is the libc startup routine -- it does some setup and then it calls main. So if you do this, you'll lose some of the initialization code that's built into your runtime library, which might include things like parsing command-line arguments. Read up on this parameter before using it.
If you're using another compiler, there may or may not be a parameter for this.
When building embedded systems firmware to run directly from ROM, I often will avoid naming the entry point main() to emphasize to a code reviewer the special nature of the code. In these cases, I am supplying a customized version of the C runtime startup module, so it is easy to replace its call to main() with another name such as BootLoader().
I (or my vendor) almost always have to customize the C runtime startup in these systems because it isn't unusual for the RAM to require initialization code for it to begin operating correctly. For instance, typical DRAM chips require a surprising amount of configuration of their controlling hardware, and often require a substantial (thousands of bus clock cycles) delay before they are useful. Until that is complete, there may not even be a place to put the call stack so the startup code may not be able to call any functions. Even if the RAM devices are operational at power on, there is almost always some amount of chip select hardware or an FPGA or two that requires initialization before it is safe to let the C runtime start its initialization.
When a program written in C loads and starts, some component is responsible for making the environment in which main() is called exist. In Unix, linux, Windows, and other interactive environments, much of that effort is a natural consequence of the OS component that loads the program. However, even in these environments there is some amount of initialization work to do before main() can be called. If the code is really C++, then there can be a substantial amount of work that includes calling the constructors for all global object instances.
The details of all of this are handled by the linker and its configuration and control files. The linker ld(1) has a very elaborate control file that tells it exactly what segments to include in the output, at what addresses, and in what order. Finding the linker control file you are implicitly using for your toolchain and reading it can be instructive, as can the reference manual for the linker itself and the ABI standard your executables must follow in order to run.
Edit: To more directly answer the question as asked in a more common context: "Can you call foo instead of main?" The answer is "Maybe, but but only by being tricky".
On Windows, an executable and a DLL are very nearly the same format of file. It is possible to write a program that loads an arbitrary DLL named at runtime, and locates an arbitrary function within it, and calls it. One such program actually ships as part of a standard Windows distribution: rundll32.exe.
Since a .EXE file can be loaded and inspected by the same APIs that handle .DLL files, in principle if the .EXE has an EXPORTS section that names the function foo, then a similar utility could be written to load and invoke it. You don't need to do anything special with main, of course, since that will be the natural entry point. Of course, the C runtime that was initialized in your utility might not be the same C runtime that was linked with your executable. (Google for "DLL Hell" for hint.) In that case, your utility might need to be smarter. For instance, it could act as a debugger, load the EXE with a break point at main, run to that break point, then change the PC to point at or into foo and continue from there.
Some kind of similar trickery might be possible on Linux since .so files are also similar in some respects to true executables. Certainly, the approach of acting like a debugger could be made to work.
A rule of thumb would be that the loader supplied by the system would always run main. With sufficient authority and competence you could theoretically write a different loader that did something else.
Rename main to be func and func to be main and call func from name.
If you have access to the source, you can do this and it's easy.
If you are using an open source compiler such as GCC or a compiler targeted at embedded systems you can modify the C runtime startup (CRT) to start at any entry point you need. In GCC this code is in crt0.s. Generally this code is partially or wholly in assembler, for most embedded systems compilers example or default start-up code will be provided.
However a simpler approach is to simply 'hide' main() in a static library that you link to your code. If that implementation of main() looks like:
int main(void)
{
func() ;
}
Then it will look to all intents and purposes as if the user entry point is func(). This is how many application frameworks with entry points other than main() work. Note that because it is in a static library, any user definition of main() will override that static library version.
The solution depends on the compiler and linker which you use. Always is that not main is the real entry point of the application. The real entry point makes some initializations and call for example main. If you write programs for Windows using Visual Studio, you can use /ENTRY switch of the linker to overwrite the default entry point mainCRTStartup and call func() instead of main():
#ifdef NDEBUG
void mainCRTStartup()
{
ExitProcess (func());
}
#endif
If is a standard practice if you write the most small application. In the case you will receive restrictions in the usage of C-Runtime functions. You should use Windows API function instead of C-Runtime function. For example instead of printf("This is func \n") you should use OutputString(TEXT("This is func \n")) where OutputString are implemented only with respect of WriteFile or WriteConsole:
static HANDLE g_hStdOutput = INVALID_HANDLE_VALUE;
static BOOL g_bConsoleOutput = TRUE;
BOOL InitializeStdOutput()
{
g_hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE);
if (g_hStdOutput == INVALID_HANDLE_VALUE)
return FALSE;
g_bConsoleOutput = (GetFileType (g_hStdOutput) & ~FILE_TYPE_REMOTE) != FILE_TYPE_DISK;
#ifdef UNICODE
if (!g_bConsoleOutput && GetFileSize (g_hStdOutput, NULL) == 0) {
DWORD n;
WriteFile (g_hStdOutput, "\xFF\xFE", 2, &n, NULL);
}
#endif
return TRUE;
}
void Output (LPCTSTR pszString, UINT uStringLength)
{
DWORD n;
if (g_bConsoleOutput) {
#ifdef UNICODE
WriteConsole (g_hStdOutput, pszString, uStringLength, &n, NULL);
#else
CHAR szOemString[MAX_PATH];
CharToOem (pszString, szOemString);
WriteConsole (g_hStdOutput, szOemString, uStringLength, &n, NULL);
#endif
}
else
#ifdef UNICODE
WriteFile (g_hStdOutput, pszString, uStringLength * sizeof (TCHAR), &n, NULL);
#else
{
//PSTR pszOemString = _alloca ((uStringLength + sizeof(DWORD)));
CHAR szOemString[MAX_PATH];
CharToOem (pszString, szOemString);
WriteFile (g_hStdOutput, szOemString, uStringLength, &n, NULL);
}
#endif
}
void OutputString (LPCTSTR pszString)
{
Output (pszString, lstrlen (pszString));
}
This really depends how you are invoking the binary, and is going to be reasonably platform and environment specific. The most obvious answer is to simply rename the "main" symbol to something else and call "func" "main", but I suspect that's not what you are trying to do.
This code compiles, but no surprises, it fails while linking (no main found):
Listing 1:
void main();
Link error: \mingw\lib\libmingw32.a(main.o):main.c:(.text+0x106) undefined reference to _WinMain#16'
But, the code below compiles and links fine, with a warning:
Listing 2:
void (*main)();
warning: 'main' is usually a function
Questions:
In listing 1, linker should have
complained for missing "main". Why
is it looking for _WinMain#16?
The executable generated from
listing 2 simply crashes. What is
the reason?
Thanks for your time.
True, main doesn't need to be a function. This has been exploited in some obfuscated programs that contain binary program code in an array called main.
The return type of main() must be int (not void). If the linker is looking for WinMain, it thinks that you have a GUI application.
In most C compilation systems, there is no type information associated with symbols that are linked. You could declare main as e.g.:
char main[10];
and the linker would be perfectly happy. As you noted, the program would probably crash, uless you cleverly initialized the contents of the array.
Your first example doesn't define main, it just declares it, hence the linker error.
The second example defines main, but incorrectly.
Case 1. is Windows-specific - the compiler probably generates _WinMain symbol when main is properly defined.
Case 2. - you have a pointer, but as static variable it's initialized to zero, thus the crash.
On Windows platforms the program's main unit is WinMain if you don't set the program up as a console app. The "#16" means it is expecting 16 bytes of parameters. So the linker would be quite happy with you as long as you give it a function named WinMain with 16 bytes of parameters.
If you wanted a console app, this is your indication that you messed something up.
You declared a pointer-to-function named main, and the linker warned you that this wouldn't work.
The _WinMain message has to do with how Windows programs work. Below the level of the C runtime, a Windows executable has a WinMain.
Try redefining it as int main(int argc, char *argv[])
What you have is a linker error. The linker expects to find a function with that "signature" - not void with no parameters
See http://publications.gbdirect.co.uk/c_book/chapter10/arguments_to_main.html etc
In listing 1, you are saying "There's a main() defined elsewhere in my code --- I promise!". Which is why it compiles. But you are lying there, which is why the link fails. The reason you get the missing WinMain16 error, is because the standard libraries (for Microsoft compiler) contain a definition for main(), which calls WinMain(). In a Win32 program, you'd define WinMain() and the linker would use the library version of main() to call WinMain().
In Listing 2, you have a symbol called main defined, so both the compiler & the linker are happy, but the startup code will try to call the function that's at location "main", and discover that there's really not a function there, and crash.
1.) An (compiler/platform) dependent function is called before code in main is executed and hence your behavior(_init in case of linux/glibc).
2) The code crash in 2nd case is justified as the system is unable to access the contents of the symbol main as a function which actually is a function pointer pointing to arbitrary location.
How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}
Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.
In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.
C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?
This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.
Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible
Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.
In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.