insmod not throwing error for a positive return - kernel-module

I am writing my first kernel module and it is a simple Hello World Kernel Module. The tldp guide I am following said that insmod would not load a module if the init_module function returns a non-zero value.
It is working as expected when returning some negative number but while experimenting I noticed that insmod is loading my module even when the return value is positive.
Please explain why?
For example if I return -185, insmod is straight away saying that it cannot load the module.
But when i return 185, it is informing about a suspicious return but still is loading the module.
This is the log for "return 185".
[19398.947857] do_init_module: 'hello_1'->init suspiciously returned 185, it should follow 0/-E convention
do_init_module: loading module anyway...
[19398.947859] CPU: 0 PID: 11812 Comm: insmod Tainted: P OE 3.19.0-15-generic #15-Ubuntu
[19398.947860] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[19398.947861] ffffffffc06c3000 ffff880059347d98 ffffffff817c2205 0000000000000007
[19398.947862] ffffffffc06c3018 ffff880059347ee8 ffffffff810f9a2d ffffffff810f51d0
[19398.947864] ffff8800db64ad10 ffff880059347e40 ffffffffc06c3018 ffffffffc0391000
[19398.947865] Call Trace:
[19398.947869] [<ffffffff817c2205>] dump_stack+0x45/0x57
[19398.947872] [<ffffffff810f9a2d>] load_module+0x160d/0x1ce0
[19398.947873] [<ffffffff810f51d0>] ? store_uevent+0x40/0x40
[19398.947875] [<ffffffff810fa276>] SyS_finit_module+0x86/0xb0
[19398.947877] [<ffffffff817c934d>] system_call_fastpath+0x16/0x1b
And this gets printed on console for "return -185"
insmod: ERROR: could not insert module hello-1.ko: Unknown error 185

init_module function should return either 0 or negative error code. You can treat returning positive value as leading to undefined behaviour.
Current kernel interprets positive value as success, but prints warning into system log. This log can be read using dmesg.

Related

How do I insert a phram module?

I need to directly write to and read from physical memory for my research and the only way I can think of doing it is with a kernel module. I found this generic device driver called phram that I've been trying to use to map memory and then write to and read from it but I'm having trouble mapping more than ~1.5GB. In my 16GB system, I have the OS usage constrained to 500MB at 0x0. Here are my kernel parameters: mem=500M memmap=500M#0. When I try to load the phram module with more than ~1.5GB like this:
sudo modprobe phram phram=test,500Mi,15Gi
I get this error message:
modprobe: ERROR: could not insert 'phram': Input/output error
And this in dmesg:
[ 247.303346] modprobe:1402 conflicting memory types 1f400000-3df400000 write-back<->write-combining
[ 247.303350] reserve_memtype failed [mem 0x1f400000-0x3df3fffff], track uncached-minus, req uncached-minus
[ 247.303352] ioremap reserve_memtype failed -16
[ 247.303376] phram: ioremap failed
[ 247.303393] phram: `test,500Mi,15Gi' invalid for parameter `phram'
I can't find any documentation on phram other than the source code. From what I have researched ioremap() (the mapping call in phram) shouldn't have a max size so I don't know where the issue could be. The dmesg output is a little too cryptic for me so if somebody could at least translate it that would also be a significant help.
Thanks!

Weird exception thrown when using simulavr with avr-gdb

I am debugging a program that I have written for the AVR architecture and compiled using avr-gcc with the -g argument.
I launch simulavr using the following command: simulavr --device atmega8 --gdbserver
Then I invoke avr-gdb and do (gdb) file main.elf as well as (gdb) target remote localhost:1212
Once debugging has started, I can successfully step through the assembly portion of my program .init et al. However, once jmp main is executed and a call to another function is made, simulavr throws the following exception: Assertion failed: (m_on_call_sp != 0x0000), function OnCall, file hwstack.cpp, line 266. Abort trap: 6
It has something to do with the pushing a frame to the stack, but I can't quite put my finger on how to fix it.
That stack value is very far from what it should be. At the start of your program, it should be near the end of RAM, not at the beginning.
It is likely to be some problem with simulavr not configuring RAM properly for your device. A quick look for the source code shows that the stack pointer is set to 0 if the simulator can't determine the correct value.
Did you include -mmcu=atmega8 in the command line when compiling? Try adding -V switch to the simulavr command for more clues.

Socket Direct Protocol error: "Address family not supported by protocol"

I thought I would try out SDP on our infiniband hardware.
However, when I try to add AF_INET_SDP as the first argument to socket() I get the following error:
"Address family not supported by protocol".
Originally I had:
#define AF_INET_SDP 26
But after doing some reading, noticed a patch applied some time back to change this value to 27.
When set to 26 I get the error:
"Error binding socket: No such device"
Has anyone managed to get SDP working on Ubuntu 12.04? what did you do to get it up and running?
I have installed libsdp1 and libsdpa-dev
Using the LD_PRELOAD method on iperf I also get the first error:
LD_PRELOAD=libsdp.so iperf -s
dir: /tmp/libsdp.log.1000 file: /tmp/libsdp.log.1000/log
socket failed: Address family not supported by protocol
bind failed: Bad file descriptor
Therefore I assume 27 is the correct domain number.
SDP hasn't been accepted on the mainline linux kernel. On recent fedora, they don't ship it, neither the user space libsdp.
If you still want to experiment, Matt is right, the module in question is 'ib_sdp'.
try modprobe ib_sdp and run your example again.

How to force to exit if init_module() failed?

I am working on a simple kernel module and I am taking arguments from command line. What I want to do is to check those arguments before loading the module.
I checked the argument and returned 1 to indicate the failure of init_module function so that the kernel module won't be loaded if arguments are not valid.
The problem was that the module was still loaded even if it didn't pass the argument check (took the first if statement). I typed sudo -f rmmod kernel_name, it complained the module is busy. How do I make it to load the module if it passes the argument check?
int init_module(){
//check argument here
if(failed){
//arguments are not valid. Return 1 to indicate the failure of init_module
return 1;
}
else{
register hook function here
return 0;
}
}
void cleanup_module(){
unregister hook here
}
I assume you are working on a Linux kernel module.
A positive return value still can be interpreted as success. A common practice is to return -error_code on error, -EINVAL in your case.

How to make good use of stack trace (from kernel or core dump)?

If you are lucky when your kernel module crashes, you would get an oops with a log with a lot of information, such as values in the registers etc. One such information is the stack trace (The same is true for core dumps, but I had originally asked this for kernel modules). Take this example:
[<f97ade02>] ? skink_free_devices+0x32/0xb0 [skin_kernel]
[<f97aba45>] ? cleanup_module+0x1e5/0x550 [skin_kernel]
[<c017d0e7>] ? __stop_machine+0x57/0x70
[<c016dec0>] ? __try_stop_module+0x0/0x30
[<c016f069>] ? sys_delete_module+0x149/0x210
[<c0102f24>] ? sysenter_do_call+0x12/0x16
My guess is that the +<number1>/<number2> has something to do with the offset from function in which the error has occurred. That is, by inspecting this number, perhaps looking at the assembly output I should be able to find out the line (better yet, instruction) in which this error has occurred. Is that correct?
My question is, what are these two numbers exactly? How do you make use of them?
skink_free_devices+0x32/0xb0
This means the offending instruction is 0x32 bytes from the start of the function skink_free_devices() which is 0xB0 bytes long in total.
If you compile your kernel with -g enabled, then you can get the line number inside functions where the control jumped using the tool addr2line or our good old gdb
Something like this
$ addr2line -e ./vmlinux 0xc01cf0d1
/mnt/linux-2.5.26/include/asm/bitops.h:244
or
$ gdb ./vmlinux
...
(gdb) l *0xc01cf0d1
0xc01cf0d1 is in read_chan (include/asm/bitops.h:244).
(...)
244 return ((1UL << (nr & 31)) & (((const volatile unsigned int *) addr)[nr >> 5])) != 0;
(...)
So just give the address you want to inspect to addr2line or gdb and they shall tell you the line number in the source file where the offending function is present
See this article for full details
EDIT: vmlinux is the uncompressed version of the kernel used for debugging and is generally found # /lib/modules/$(uname -r)/build/vmlinux provided you have built your kernel from sources. vmlinuz that you find at /boot is the compressed kernel and may not be that useful in debugging
For Emacs users, here's is a major mode to easily jump around within the stack trace (uses addr2line internally).
Disclaimer: I wrote it :)
regurgitating this answer you need to use faddr2line
In my case I had the following truncated call trace:
[ 246.790938][ T35] Call trace:
[ 246.794075][ T35] __switch_to+0x10c/0x180
[ 246.798348][ T35] __schedule+0x278/0x6e0
[ 246.802531][ T35] schedule+0x44/0xd0
[ 246.806368][ T35] rpm_resume+0xf4/0x628
[ 246.810463][ T35] __pm_runtime_resume+0x94/0xc0
[ 246.815257][ T35] macb_open+0x30/0x2b8
[ 246.819265][ T35] __dev_open+0x10c/0x188
and ran the following in the mainline linux kernel:
./scripts/faddr2line vmlinux macb_open+0x30/0x2b8
giving the output
macb_open+0x30/0x2b8:
pm_runtime_get_sync at include/linux/pm_runtime.h:386
(inlined by) macb_open at drivers/net/ethernet/cadence/macb_main.c:2726

Resources