Porting module to newer Linux kernel: Cannot allocate memory - c

I have a quite big driver module that I am trying to compile for a recent Linux kernel (3.4.4). I can successfully compile and insmod the same module with a 2.6.27.25 kernel.
GCC version are also different, 4.7.0 vs 4.3.0. Note that this module is quite complicated and I cannot simply go through all the code and all the makefiles.
When "inserting" the module I get a Cannot allocate memory with the following traces:
vmap allocation for size 30248960 failed: use vmalloc=<size> to increase size.
vmalloc: allocation failure: 30243566 bytes
insmod: page allocation failure: order:0, mode:0xd2
Pid: 5840, comm: insmod Tainted: G O 3.4.4-5.fc17.i686 #1
Call Trace:
[<c092702a>] ? printk+0x2d/0x2f
[<c04eff8d>] warn_alloc_failed+0xad/0xf0
[<c05178d9>] __vmalloc_node_range+0x169/0x1d0
[<c0517994>] __vmalloc_node+0x54/0x60
[<c0490825>] ? sys_init_module+0x65/0x1d80
[<c0517a60>] vmalloc+0x30/0x40
[<c0490825>] ? sys_init_module+0x65/0x1d80
[<c0490825>] sys_init_module+0x65/0x1d80
[<c050cda6>] ? handle_mm_fault+0xf6/0x1d0
[<c0932b30>] ? spurious_fault+0xae/0xae
[<c0932ce7>] ? do_page_fault+0x1b7/0x450
[<c093665f>] sysenter_do_call+0x12/0x28
-- clip --
The obvious answer seems to be that the module is allocating too much memory, however:
I have no problem with the old kernel version, what ever size this module is
if I prune some part of this module to get a much lower memory consumption, I will get always the same error message with the new kernel
I can unload a lot of other modules, but it has no impact (and is it anyway relevant? is there a global limit with Linux regarding the total memory usage by modules)
I am therefore suspecting a problem with the new kernel not directly related to limited memory.
The new kernel is complaining about a vmalloc() of 30,000 KB, but with the old kernel, an lsmod gives me a size of 4,800 KB. Should these figures be directly related? Is it possible that something went wrong during the build and that it is just too much RAM being requested? When I compile the sections size of both .ko, I do not see big differences.
So I am trying to understand where the problem is from. When I check the dumped stack, I am unable to find the matching piece of code. It seems that the faulty vmalloc() is done by sys_init_module(), which is init_module() from kernel/module.c. But the code does not match. When I check the object code from my .ko, the init_module() code also does not match.
I am more or less blocked as I do not know the kernel well enough, and all the build system and the module loading is quite tough to understand. The error occurs before the module is loaded, as I suspect that some functions are missing and insmod does not report these errors at this point.

I believe the allocation is done in layout_and_allocate, which is called by load_module. Both are static function, so they may be inlined, and therefore not on the stack.
So it's not an allocation done by your code, but an allocation done by Linux in order to load your code.
If your old kernel is 4.8MB and the new one is 30MB, it can explain why it fails.
So the question is why is it so large.
The size may be due to the amount of code (not likely that it has grown so much) or statically allocated data.
A likely explanation is that you have a large statically allocated array, whose size is defined in Linux. If the size has grown significantly, your array would grow.
A guess - an array whose size is NR_CPUS.
You should be able to use commands such as nm or objdump to find such an array. I'm not sure how exactly to do it however.

The problem was actually due to the debug sections in the module. The old kernel was able to ignore these sections, but the new one was counting them in the total size to allocate. However, when enabling the pr_debug() traces from module.c at loading time, these sections were not dumped with the others.
How to get rid of them and solve the problem:
objcopy -R .debug_aranges \
-R .debug_info \
-R .debug_abbrev \
-R .debug_line \
-R .debug_frame \
-R .debug_str \
-R .debug_loc \
-R .debug_ranges \
orignal.ko new.ko
It is also possible that the specific build files for this project were adding debug information "tailored" for the old kernel version, but when trying with a dummy module, I find exactly the same kind of debug sections appended, so I would rather suspect some policy change regarding module management in the kernel or in Fedora.
Any information regarding these changes are welcome.

Related

How does addr2line work with virtual addresses for kernel space debugging?

I was trying my hands on addr2line to convert a "pc" register value from a kernel oops (example) to a line in the kernel code. I believe that the value of the program counter represents a virtual address.
Now this post on Stack Overflow says that we generally provide an offset to addr2line and not a virtual address. VA can only be used when the address space randomization is turned off. Does this hold true for a kernel as well? I believe it should.
This Embedded Linux Conference talk on slide 14 also makes use of the program counter value to jump to the line in code, but I believe this would work only work when the address space randomization is off. Otherwise, once the virtual memory is initialized, it's possible that the kernel gets relocated randomly. In this case, any virtual address picked from an oops should not make any sense to addr2line. This is all theory. I have 2 questions now:
Is my understanding correct? If not, please correct me.
How do we turn off the address space randomization for a kernel so that the location of a symbol can be predicted?
Yes, your understanding is correct.
You have multiple options:
Completely remove KASLR support by building the kernel with CONFIG_RANDOMIZE_BASE=n Drastic solution, wouldn't recommend if not for developing purposes.
Boot the kernel with the command line argument nokaslr. See here for more info.
Manually compute the offset of the address from the start of the kernel's .text segment. Not that easy, would require knowing the base address beforehand or extrapolating it from the panic info. Definitely doable with some grep + objdump + some more ELF tools probably, but pretty annoying and time consuming.
NB: of course points 1 and 2 require that the kernel is compiled with debugging symbols for addr2line to do its job.
See also: this Linux kernel doc page.

Allocating executable memory using kmalloc

The answers to the question How to allocate an executable page in a Linux kernel module? describe how executable memory can be allocated using __vmalloc(). Is this also possible using kmalloc()? My goal is having a physically-contiguous executable memory area.
It does not have exec permissions. I tried it, and dmesg shows "kernel tried to execute NX-protected page - exploit attempt? (uid: 0)"
Then no, I'd assume you can't kmalloc executable memory. Unless I'm wrong about how it works (returning pointers into an existing mapping that uses 1GB hugepages to cover all of physical RAM) it's just plain incompatible with the purpose / design of kmalloc.
There might be something other than vmalloc that you could use, if you really need more than 1 physically-contiguous 4k page of executable memory, but I don't know what it is. (I'm not a kernel dev, I just know a little bit about the big picture, and lots about CPU architecture / x86). Perhaps something like vmalloc and then changing the page tables?
Other answers welcome.

Why malloc() returns null, and how much needed or left for the heap on a STM32F407?

I just found out that my decoder library fails to initialize as malloc() fails to allocate memory and returns to the caller with "NULL".
I tried many possible scenarios, with or without casting and referred to a lot of other threads about malloc(), but nothing has worked, until I changed the heap size to 0x00001400, which has apparently solved the problem.
Now, the question is, how can I tell how much heap needed, or left for the program? The datasheet says my MCU has: "Up to 192+4 Kbytes of SRAM including 64-Kbyte of CCM (core coupled memory) data RAM" Could someone explain to me what that means? Changing that to 0x00002000 (8192 bytes) would lead to dozens of the following error:
Error: L6406E: No space in execution regions with .ANY selector
Isn't 8KB of RAM is fraction of fraction of what the device has? Why I can't add more to the heap beyond the 0x00001800?
The program size reported by Keil after compilation is:
Program Size: Code=103648 RO-data=45832 RW-data=580 ZI-data=129340
The error Error: L6406E, is because no enough RAM on your target to support in linker file, there is no magic way to get more RAM, both stack and heap are using RAM memory, But in you case it seems to have more than enough memory but compiler is not aware of same.
My suggestion is to use linker response files with the Keil µVision IDE and update required memory section according to the use..
The linker command (or response) file contains only linker directives. The .OBJ files and .LIB files that are to be linked are not listed in the command file. These are obtained by µVision automatically from your project file.
The best way to start using a linker command file is to have µVision create one for you automatically and then begin making the necessary changes.
To generate a Command File from µVision...
Go to the Project menu and select the Options for Target item.
Click on the L166 Misc or L51 Misc tab to open the miscellaneous linker options.
Check the use linker control file checkbox.
Click on the Create... button. This creates a linker control file.
Click on the Edit... button. This opens the linker control file for editing.
Edit the command file to include the directives you need.
When you create a linker command file, the file created includes the directives you currently have selected.
Regarding malloc() issue you are facing,
The sizes of heap required is based on how much memory required in a application, especially the memory required dynamic memory allocation using malloc and calloc.
please note that some of the C library like "printf" functions are also using dynamic memory allocation under the hood.
If you are using the keil IDE for compiling your source code then you can increase the heap size by modifying the startup file.
;******************************************************************************
;
; <o> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
;
;******************************************************************************
Heap EQU 0x00000000
;******************************************************************************
;
; Allocate space for the heap.
;
;******************************************************************************
AREA HEAP, NOINIT, READWRITE, ALIGN=3
__heap_base
HeapMem
SPACE Heap
__heap_limit
;******************************************************************************
if you are using the make enveromennt to build the applicatation then simpely change the HEAP sizse in liner file.
Details regarding same you can get directly from Keil official website, Please check following links,
https://www.keil.com/pack/doc/mw/General/html/mw_using_stack_and_heap.html
http://www.keil.com/forum/11132/heap-size-bss-section/
http://www.keil.com/forum/14201/
BR
Jerry James.
Now, the question is, how can I tell how much heap needed, or left for the program?
That is two separate questions.
The amount of heap needed is generally non-deterministic (one reason for avoiding dynamic memory allocation in most cases in embedded systems with very limited memory) - it depends entirely on the behaviour of your program, and if your program has a memory leak bug, even knowledge of the intended behaviour won't help you.
However, any memory not allocated statically by your application can generally be allocated to the heap, otherwise it will remain unused by the C runtime in any case. In other toolchains, it is common for the linker script to automatically allocate all unused memory to the heap, so that it is as large as possible, but the default script and start-up code generated by Keil's ARM MDK does not do that; and if you make it as large as possible, then modify the code you may have to adjust the allocation each time - so it is easiest during development at least to leave a small margin for additional static data.
The datasheet says my MCU has: "Up to 192+4 Kbytes of SRAM including 64-Kbyte of CCM (core coupled memory) data RAM" Could someone explain to me what that means?
Another problem is that the ARM MDK C library's malloc() implementation requires a contiguous heap and does not support the addition of arbitrary memory blocks (as far as I have been able to determine in any case), so the 64Kb CCM block cannot be used as heap memory unless the entire heap is allocated there. The memory is in fact segmented as follows:
SRAM1 112 kb
SRAM2 16 kb
CCM 64 kb
BKUPSRAM 4 kb
SRAM 1/2 are contiguous but on separate buses (which can be exploited to support DMA operations without introducing wait-states for example).
The CCM mmeory cannot be used for DMA or bit-banding, and the default ARM-MDK generated linker script does not map it at all, so to utilise it you must use a custom linker script, and then ensure that any DMA or bit-banded data are explicitly located in one of the other regions. If your heap need not be more than 64kb you could locate it there but to do that needs a modification of the start-up assembler code that allocates the heap.
The 4Kb backup SRAM is accessed as a peripheral and is mapped in the peripheral register space.
With respect to determining how much heap remains at run-time, the ARM library provides a somewhat cumbersome __heapstats function. Unfortunately it does not simply return the available freespace (it is not quite as simple as that because heap free space is not on its own particularly useful since block fragmentation can least to allocation failure even if cumulatively there is sufficient memory). __heapstats requires a pointer to an fprintf()-like function to output formatted text information on heap state. For example:
void heapinfo()
{
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)fprintf, stdout ) ;
}
Then you might write:
mem = malloc( some_space ) ;
if( mem == NULL )
{
heapinfo() ;
for(;;) ; // wait for watchdog or debugger attach
}
// memory allocated successfully
Given:
Program Size: Code=103648 RO-data=45832 RW-data=580 ZI-data=129340
You have used 129920 of the available 131652 bytes, so could in theory add 1152 bytes to the heap, but you would have to keep changing this as the ammount of static data changed as you modified your code. Part of the ZI (zero initialised) data is your heap allocation, everything else is your application stack and static data with no explicit initialiser. The full link map generated by the linker will show what is allocated statically.
It may be possible to increase heap size by reducing stack allocation. The ARM linker can generate stack usage analysis in the link map (as described here) to help "right-size" your stack. If you have excessive stack allocation, this may help. However stack-overflow errors are even more difficult to detect and debug than memory allocation failure and call by function-pointer and interrupt processing will confound such analysis, so leave a safety margin.
It would perhaps be better to use a customised linker script and modify the heap location in the start-up code to locate the heap in the otherwise unused CCM segment (and be sure you do not use dynamic memory for either DMA or bit-banding). You can then safely create a 64Kb heap assuming you locate nothing else there.

Valgrind mmap error 22

I am trying to run a program on Valgrind. But I am getting this error:
valgrind: mmap(0x67d000, 1978638336) failed in UME with error 22 (Invalid argument).
valgrind: this can be caused by executables with very large text, data or bss segments.
I am unsure what the issue is. I know that I have plenty of memory (I am running on a server with 500+ GB of ram). Is there a way of making this work?
Edit: Here are my program and machine details:
So my machine (it is a server for research purposes) has this much RAM:
$ free -mt
total used free shared buff/cache available
Mem: 515995 8750 162704 29 344540 506015
Swap: 524277 762 523515
Total: 1040273 9513 686219
And the program (named Tardis) size info:
$ size tardis
text data bss dec hex filename
509180 2920 6273605188 6274117288 175f76ea8 tardis
Unfortunately there is no easy answer to this. The Valgrind host has to load its text somewhere (and also put its heap and stack somewhere). There will always be conflicts with some guest applications.
It would be nice if we could have an argument like --host-text-address=0x68000000. That's not possible as the link editor writes it into the binary. It isn't possible to change this with ld.so. The only way to change it is to rebuild Valgrind with a different value. The danger then is that you get new conflicts.

Debug stack overruns in kernel modules

I am working on a driver code, which is causing stack overrun
issues and memory corruption. Presently running the module gives,
"Exception stack" and the stack trace looks corrupted.
The module had compile warnings. The warnings were resolved
with gcc option "-WFrame-larger-than=len".
The issue is possibly being caused by excessive in-lining and lots of
function arguments and large number of nested functions. I need to continue
testing and continue re-factoring the code, is it possible to
make any modifications kernel to increase the stack size ? Also how would you go about debugging such issues.
Though your module would compile with warnings with "-WFrame-larger-than=len", it would still cause the stack overrun and could corrupt the in-core data structures, leading the system to an inconsistency state.
The Linux kernel stack size was limited to the 8KiB (in kernel versions earlier before 3.18), and now 16KiB (for the versions later than 3.18). There is a recent commit due to lots of issues in virtio and qemu-kvm, kernel stack has been extended to 16KiB.
Now if you want to increase stack size to 32KiB, then you would need to recompile the kernel, after making the following change in the kernel source file:(arch/x86/include/asm/page_64_types.h)
// for 32K stack
- #define THREAD_SIZE_ORDER 2
+ #define THREAD_SIZE_ORDER 3
A recent commit shows on Linux kernel version 3.18, shows the kernel stack size already being increased to 16K, which should be enough in most cases.
"
commit 6538b8ea886e472f4431db8ca1d60478f838d14b
Author: Minchan Kim <minchan#kernel.org>
Date: Wed May 28 15:53:59 2014 +0900
x86_64: expand kernel stack to 16K
"
Refer LWN: [RFC 2/2] x86_64: expand kernel stack to 16K
As for debugging such issues there is no single line answer how to, but here are some tips I can share. Use and dump_stack() within your module to get a stack trace in the syslog which really helps in debugging stack related issues.
Use debugfs, turn on the stack depth checking functions with:
# mount -t debugfs nodev /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
and regularly capture the output of the following files:
# cat /sys/kernel/debug/tracing/stack_max_size
# cat /sys/kernel/debug/tracing/stack_trace
The above files will report the highest stack usage when the module is loaded and tested.
Leave the below command running:
while true; do date; cat /sys/kernel/debug/tracing/stack_max_size;
cat /sys/kernel/debug/tracing/stack_trace; echo ======; sleep 30; done
If you see the stack_max_size value exceeding maybe ~14000 bytes (for 16KiB stack version of the kernel) then the stack trace would be worth capturing looking into further. Also you may want to set-up crash tool to capture vmcore core file in cases of panics.

Resources