linux memory management - how to get "Random xxx offset"? - c

I am studying process memory management.
I read a post about Process address space layout.
I referenced the following URL.
In linux, start_data, end_data, start_brk, brk, etc are member variable of struct mm_struct.
However I want to know how to calculate Random brk, stack, mmap offset.
It seems that those three values(Random xxx offset) are't defined in struct mm_struct.
Is there any function or MACRO to calculate those values?
I am using linux kernel version 4.4 and x86-64 architecture.
Thank you.

The OS already implements /proc/< pid >/maps which shows all VMAs of that process, including the stack,heap and of course the mmap-ed ones.
If you want to check from where all these information fill you can check kernel source code, the relevant code (to look up VMAs of a given PID) seems to be here: fs/proc/task_mmu.c .
And, yes indeed, the "[heap]" is marked by this code snippet from the above src file (kernel ver 3.10.24):
fs/proc/task_mmu.c:show_map_vma()
...
if (vma->vm_start <= mm->brk && vma->vm_end >= mm->start_brk)
{
name = "[heap]"; goto done; }
...
And one more thing if you want to check start-end address of particular segment, Do check The mm_struct is defined in . you will get following thing :-
struct mm_struct{
......
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
......
}
start_code, end_code The start and end address of the code section;
start_data, end_data The start and end address of the data section;
start_brk, brk The start and end address of the heap;
start_stack Predictably enough, the start of the stack region;

Related

Usage of Xilinx built-in UART function #define XUartPs_IsReceiveData (BaseAddress )

So I am trying to use this built-in UART function (from the Vitis SDK from Xilinix) to determine if there is a valid byte to read over UART. I created this function to return 1 if there was a byte to read or 0 if there wasn't
u32 UartHasMessage(void){
if(XUartPs_IsReceiveData(&XUartPs_Main)){
return 1;
}
else{
return 0;
}
}
However, even when there is a byte to read over UART, this function always returns false.
The weird behavior I am experiencing is when I step through the code using the debugger, I call UartHasMessage() to check if there is a byte to read, and it returns false, but in the next line I call a function to read a byte over UART and that contains the correct byte I sent over the host.
u32 test - UartHasMessage();
UartGetByte(&HostReply);
How come this UartHasMessage always returns false, but then in the next line I am able to read the byte correctly?
Caveat: Without some more information, this is a bit speculative and might be a comment, but it is too large for that.
The information below comes from the Xilinx documentation on various pages ...
XUartPs_RecvByte will block until a byte is ready. So, no need to call XUartPs_IsReceiveData directly (I think that XUartPS_RecvByte calls it internally).
A web search on XUartPs_Main came up with nothing, so we'd need to see the definition you have.
Most Xilinx documentation uses UART_BASEADDRESS:
#define UART_BASEADDR XPAR_XUARTPS_0_BASEADDR
I found a definition:
#define XPAR_XUARTPS_0_BASEADDR 0xE0001000
You might be better off using a more standard method, such as calling the XUartPs_LookupConfig function to get the configuration table entry which has all relevant values.
I'm guessing that you created the XUartPS_Main definition.
But, based on what you posted, (needing &XUartPS_Main instead of XUartPS_Main), it is linked/loaded at the exact address of the UART register bank. Let's assume that address is (e.g.) 0x10000. So, we might have:
u32 XUartPS_Main __attribute__(at(0x10000));
The at is an extension that some build systems support (e.g. arm) that forces the variable to be loaded at a given address. So, let's assume we have that (even if the mechanism is slightly different (e.g.):
__attribute__((section(".ARM.__at_0x10000")))
The definition of XUARTPS_SR_OFFSET is:
#define XUARTPS_SR_OFFSET 0x002CU
Offsets are [typically] byte offsets.
Given:
#define XUartPs_IsReceiveData(BaseAddress) \
!((Xil_In32((BaseAddress) + XUARTPS_SR_OFFSET) & \
(u32)XUARTPS_SR_RXEMPTY) == (u32)XUARTPS_SR_RXEMPTY)
Now if the definition of XUartPS_Main uses u32 [as above], we may have a problem because XUARTPS_SR_OFFSET will be treated as a u32 index and not a byte offset. So, it will access the wrong address.
So, try:
XUartPs_IsReceiveData((unsigned char *) &XUartPs_Main)
But, if it were me, I'd rework things to use Xilinx's standard definitions.
UPDATE:
Hi so XUartPs_main is defined as static XUartPs XUartPs_Main; I use it in a variety of functions such as a function to send bytes over uart and I call it by its address like I did with this function, all my other functions work as expected except this one. Is it possible it is something to do with the way the fifo works? –
29belgrade29
No, not all the API functions are the same.
The struct definition is [I synthesized this from the API doc]:
typedef struct {
u16 DeviceId; // Unique ID of device.
u32 BaseAddress; // Base address of device (IPIF)
u32 InputClockHz;
} XUartPs;
Somewhere in your code you had to initialize this with:
XUartPs_Main = XUartPs_ConfigTable[my_device_id];
Or, with:
XUartPs_Main = *XUartPs_LookupConfig(my_device_id);
If an API function is defined as (e.g.):
void api_dosomething(XUartPs_Config *cfg,...)
Then, you call it with:
api_dosomething(&XUartPs_Main,...);
So, most functions probably take such a pointer.
But, XUartPs_IsReceiveData does not want a pointer to a XUartPs_Config struct. It wants a base address. This is:
XUartPs_Main.BaseAddress
So, you want:
XUartPs_IsReceiveData(XUartPs_Main.BaseAddress)

What is the memory node in kzalloc_node in the Linux kernel

I do not understand what the memory node is in the kzalloc_node function. The description says, "allocate zeroed memory from a particular memory node." But what is a memory node? I am specifically looking at a portion of the deadline I/O scheduler (shown below).
static int deadline_init_queue(struct request_queue *q, struct elevator_type *e)
{
struct deadline_data *dd;
...
dd = kzalloc_node(sizeof(*dd), GFP_KERNEL, q->node);
...
}
There's a very good description here:
https://www.kernel.org/doc/gorman/html/understand/understand009.html
...the function alloc_pages() calls numa_node_id() to return the
logical ID of the node associated with the current running CPU. This
NID is passed to _alloc_pages() which calls NODE_DATA() with the NID
as a parameter.
On UMA architectures, this will unconditionally result
in contig_page_data being returned but NUMA architectures instead set
up an array which NODE_DATA() uses NID as an offset into. In other
words, architectures are responsible for setting up a CPU ID to NUMA
memory node mapping.
This is effectively still a node-local allocation
policy as is used in 2.4 but it is a lot more clearly defined.
See also: https://en.wikipedia.org/wiki/Non-uniform_memory_access

Kernel sys_call_table address does not match address specified in system.map

I am trying to brush up on C so I have been playing around with the linux kernel's system call table (on 3.13.0-32-generic). I found a resource online that searches for the system call table with the following function which I load into the kernel in an LKM:
static uint64_t **aquire_sys_call_table(void)
{
uint64_t offset = PAGE_OFFSET;
uint64_t **sct;
while (offset < ULLONG_MAX) {
sct = (uint64_t **)offset;
if (sct[__NR_close] == (uint64_t *) sys_close) {
printk("\nsys_call_table found at address: 0x%p\n", sys_call_table);
return sct;
}
offset += sizeof(void *);
}
return NULL;
}
The function works. I am able to use the address it returns to manipulate the system call table. What I don't understand is why the address returned by this function doesn't match the address in /boot/System.map-(KERNEL)
Here is what the function prints:
sys_call_table found at address: 0xffff880001801400
Here is what I get when I search system.map
$ sudo cat /boot/System.map-3.13.0-32-generic | grep sys_call_table
ffffffff81801400 R sys_call_table
ffffffff81809cc0 R ia32_sys_call_table
Why don't the two addresses match? Its my understanding that the module runs in the kernel's address space, so the address of the system call table should be the same.
The two virtual addresses have the same physical address.
From Documentation/x86/x86_64/mm.txt
<previous description obsolete, deleted>
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffec0000000000 - fffffc0000000000 (=44 bits) kasan shadow memory (16TB)
... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).
vmalloc space is lazily synchronized into the different PML4 pages of
the processes using the page fault handler, with init_level4_pgt as
reference.
Current X86-64 implementations only support 40 bits of address space,
but we support up to 46 bits. This expands into MBZ space in the page tables.
->trampoline_pgd:
We map EFI runtime services in the aforementioned PGD in the virtual
range of 64Gb (arbitrarily set, can be raised if needed)
0xffffffef00000000 - 0xffffffff00000000
-Andi Kleen, Jul 2004
we know the virtual address space ffff880000000000-ffffc7ffffffffff is direct mapping of all physical memory. When the kernel wants to access all physical memory, it uses direct mapping. It's also what you use for searching.
And the ffffffff80000000-ffffffffa0000000 is kernel text mapping. When the kernel code executed, rip register uses the kernel text mapping.
In arch/x86/include/asm/page_64.h, we can get the relation of virtual address and physical address.
static inline unsigned long __phys_addr_nodebug(unsigned long x)
{
unsigned long y = x - __START_KERNEL_map;
/* use the carry flag to determine if x was < __START_KERNEL_map */
x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
return x;
}
and
// arch/x86/include/asm/page_types.h
#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)
// arch/x86/include/asm/page_64_types.h
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)
#define __PAGE_OFFSET _AC(0xffff880000000000, UL)
As for the addresses mentioned in the question above:
what the function prints,
sys_call_table found at address: 0xffff880001801400
what system.map gives,
$ sudo cat /boot/System.map-3.13.0-32-generic | grep sys_call_table
ffffffff81801400 R sys_call_table
ffffffff81809cc0 R ia32_sys_call_table
both of them resolve to same physical address.
virt->phys conversion happens in such way that corresponding addresses in 'direct' mapping region and 'kernel text' mapping region resolve to same physical address.
Through the magic of virtual memory mapping, the address you use depends on where you are. The symbol table file System.map is to help attaching a gdb or crash utility to the running system. Inside the kernel, well, is inside the kernel.
You may also have a /proc/kallsym file for even more values :)
Only root can show the addresses in the /proc/kallsyms file! It is rarely disabled but you can enable it if it's disabled. But the addresses in the System.map and kallsyms file for the same sys_call are different.
If a person is using a kernel built by himself, then System.map is preferable but if you are using a pre-built kernel (like we mostly do), then kallsyms is the right place for you!

Does Linux kernel have main function?

I am learning Device Driver and Kernel programming.According to Jonathan Corbet book we do not have main() function in device drivers.
#include <linux/init.h>
#include <linux/module.h>
static int my_init(void)
{
return 0;
}
static void my_exit(void)
{
return;
}
module_init(my_init);
module_exit(my_exit);
Here I have two questions :
Why we do not need main() function in Device Drivers?
Does Kernel have main() function?
start_kernel
On 4.2, start_kernel from init/main.c is a considerable initialization process and could be compared to a main function.
It is the first arch independent code to run, and sets up a large part of the kernel. So much like main, start_kernel is preceded by some lower level setup code (done in the crt* objects in userland main), after which the "main" generic C code runs.
How start_kernel gets called in x86_64
arch/x86/kernel/vmlinux.lds.S, a linker script, sets:
ENTRY(phys_startup_64)
and
phys_startup_64 = startup_64 - LOAD_OFFSET;
and:
#define LOAD_OFFSET __START_KERNEL_map
arch/x86/include/asm/page_64_types.h defines __START_KERNEL_map as:
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)
which is the kernel entry address. TODO how is that address reached exactly? I have to understand the interface Linux exposes to bootloaders.
arch/x86/kernel/vmlinux.lds.S sets the very first bootloader section as:
.text : AT(ADDR(.text) - LOAD_OFFSET) {
_text = .;
/* bootstrapping code */
HEAD_TEXT
include/asm-generic/vmlinux.lds.h defines HEAD_TEXT:
#define HEAD_TEXT *(.head.text)
arch/x86/kernel/head_64.S defines startup_64. That is the very first x86 kernel code that runs. It does a lot of low level setup, including segmentation and paging.
That is then the first thing that runs because the file starts with:
.text
__HEAD
.code64
.globl startup_64
and include/linux/init.h defines __HEAD as:
#define __HEAD .section ".head.text","ax"
so the same as the very first thing of the linker script.
At the end it calls x86_64_start_kernel a bit awkwardly with and lretq:
movq initial_code(%rip),%rax
pushq $0 # fake return address to stop unwinder
pushq $__KERNEL_CS # set correct cs
pushq %rax # target address in negative space
lretq
and:
.balign 8
GLOBAL(initial_code)
.quad x86_64_start_kernel
arch/x86/kernel/head64.c defines x86_64_start_kernel which calls x86_64_start_reservations which calls start_kernel.
arm64 entry point
The very first arm64 that runs on an v5.7 uncompressed kernel is defined at https://github.com/cirosantilli/linux/blob/v5.7/arch/arm64/kernel/head.S#L72 so either the add x13, x18, #0x16 or b stext depending on CONFIG_EFI:
__HEAD
_head:
/*
* DO NOT MODIFY. Image header expected by Linux boot-loaders.
*/
#ifdef CONFIG_EFI
/*
* This add instruction has no meaningful effect except that
* its opcode forms the magic "MZ" signature required by UEFI.
*/
add x13, x18, #0x16
b stext
#else
b stext // branch to kernel start, magic
.long 0 // reserved
#endif
le64sym _kernel_offset_le // Image load offset from start of RAM, little-endian
le64sym _kernel_size_le // Effective size of kernel image, little-endian
le64sym _kernel_flags_le // Informative flags, little-endian
.quad 0 // reserved
.quad 0 // reserved
.quad 0 // reserved
.ascii ARM64_IMAGE_MAGIC // Magic number
#ifdef CONFIG_EFI
.long pe_header - _head // Offset to the PE header.
This is also the very first byte of an uncompressed kernel image.
Both of those cases jump to stext which starts the "real" action.
As mentioned in the comment, these two instructions are the first 64 bytes of a documented header described at: https://github.com/cirosantilli/linux/blob/v5.7/Documentation/arm64/booting.rst#4-call-the-kernel-image
arm64 first MMU enabled instruction: __primary_switched
I think it is __primary_switched in head.S:
/*
* The following fragment of code is executed with the MMU enabled.
*
* x0 = __PHYS_OFFSET
*/
__primary_switched:
At this point, the kernel appears to create page tables + maybe relocate itself such that the PC addresses match the symbols of the vmlinux ELF file. Therefore at this point you should be able to see meaningful function names in GDB without extra magic.
arm64 secondary CPU entry point
secondary_holding_pen defined at: https://github.com/cirosantilli/linux/blob/v5.7/arch/arm64/kernel/head.S#L691
Entry procedure further described at: https://github.com/cirosantilli/linux/blob/v5.7/arch/arm64/kernel/head.S#L691
Fundamentally, there is nothing special about a routine being named main(). As alluded to above, main() serves as the entry point for an executable load module. However, you can define different entry points for a load module. In fact, you can define more than one entry point, for example, refer to your favorite dll.
From the operating system's (OS) point of view, all it really needs is the address of the entry point of the code that will function as a device driver. The OS will pass control to that entry point when the device driver is required to perform I/O to the device.
A system programmer defines (each OS has its own method) the connection between a device, a load module that functions as the device's driver, and the name of the entry point in the load module.
Each OS has its own kernel (obviously) and some might/maybe start with main() but I would be surprised to find a kernel that used main() other than in a simple one, such as UNIX! By the time you are writing kernel code you have long moved past the requirement to name every module you write as main().
Hope this helps?
Found this code snippet from the kernel for Unix Version 6. As you can see main() is just another program, trying to get started!
main()
{
extern schar;
register i, *p;
/*
* zero and free all of core
*/
updlock = 0;
i = *ka6 + USIZE;
UISD->r[0] = 077406;
for(;;) {
if(fuibyte(0) < 0) break;
clearsig(i);
maxmem++;
mfree(coremap, 1, i);
i++;
}
if(cputype == 70)
for(i=0; i<62; i=+2) {
UBMAP->r[i] = i<<12;
UBMAP->r[i+1] = 0;
}
// etc. etc. etc.
Several ways to look at it:
Device drivers are not programs. They are modules that are loaded into another program (the kernel). As such, they do not have a main() function.
The fact that all programs must have a main() function is only true for userspace applications. It does not apply to the kernel, nor to device drivers.
With main() you propably mean what main() is to a program, namely its "entry point".
For a module that is init_module().
From Linux Device Driver's 2nd Edition:
Whereas an application performs a single task from beginning to end, a module registers itself in order to serve future requests, and its "main" function terminates immediately. In other words, the task of the function init_module (the module's entry point) is to prepare for later invocation of the module's functions; it's as though the module were saying, "Here I am, and this is what I can do." The second entry point of a module, cleanup_module, gets invoked just before the module is unloaded. It should tell the kernel, "I'm not there anymore; don't ask me to do anything else."
Yes, the Linux kernel has a main function, it is located in arch/x86/boot/main.c file. But Kernel execution starts from arch/x86/boot/header.S assembly file and the main() function is called from there by "calll main" instruction.
Here is that main function:
void main(void)
{
/* First, copy the boot header into the "zeropage" */
copy_boot_params();
/* Initialize the early-boot console */
console_init();
if (cmdline_find_option_bool("debug"))
puts("early console in setup code.\n");
/* End of heap check */
init_heap();
/* Make sure we have all the proper CPU support */
if (validate_cpu()) {
puts("Unable to boot - please use a kernel appropriate "
"for your CPU.\n");
die();
}
/* Tell the BIOS what CPU mode we intend to run in. */
set_bios_mode();
/* Detect memory layout */
detect_memory();
/* Set keyboard repeat rate (why?) and query the lock flags */
keyboard_init();
/* Query Intel SpeedStep (IST) information */
query_ist();
/* Query APM information */
#if defined(CONFIG_APM) || defined(CONFIG_APM_MODULE)
query_apm_bios();
#endif
/* Query EDD information */
#if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
query_edd();
#endif
/* Set the video mode */
set_video();
/* Do the last things and invoke protected mode */
go_to_protected_mode();
}
While the function name main() is just a common convention (there is no real reason to use it in kernel mode) the linux kernel does have a main() function for many architectures, and of course usermode linux has a main function.
Note the OS runtime loads the main() function to start an app, when an operating system boots there is no runtime, the kernel is simply loaded to a address by the boot loader which is loaded by the MBR which is loaded by the hardware. So while a kernel may contain a function called main it need not be the entry point.
See Also:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms633559%28v=vs.85%29.aspx
Linux kernel source:
x86: linux-3.10-rc6/arch/x86/boot/main.c
arm64: linux-3.10-rc6/arch/arm64/kernel/asm-offsets.c

How to view Linux memory map info in C?

I'm dynamically loading some Linux libraries in C.
I can get the start addresses of the libraries using the
dlinfo
(see 1).
I can't find any information to get the size of a library, however.
The only thing that I've found is that one must read the
/proc/[pid]/maps
file and parse it for the relevant information (see 2).
Is there a more elegant method?
(This answer is LINUX/GLIBC specific)
According to http://s.eresi-project.org/inc/articles/elf-rtld.txt
there are link_map *map; map->l_map_start & map->l_map_end
/*
** Start and finish of memory map for this object.
** l_map_start need not be the same as l_addr.
*/
ElfW(Addr) l_map_start, l_map_end;
It is a bit not exact, as said here http://www.cygwin.com/ml/libc-hacker/2007-06/msg00014.html
= some libraries are not continous in memory; the letter linked has some examples... e.g. this is the very internal (to rtld) function to detect is the given address inside lib's address space or not, based on link_map and direct working with ELF segments:
/* Return non-zero if ADDR lies within one of L's segments. */
int
internal_function
_dl_addr_inside_object (struct link_map *l, const ElfW(Addr) addr)
{
int n = l->l_phnum;
const ElfW(Addr) reladdr = addr - l->l_addr;
while (--n >= 0)
if (l->l_phdr[n].p_type == PT_LOAD
&& reladdr - l->l_phdr[n].p_vaddr >= 0
&& reladdr - l->l_phdr[n].p_vaddr < l->l_phdr[n].p_memsz)
return 1;
return 0;
}
And this function is the Other alternative, which is to find program headers/ or section headers of ELF loaded (there are some links to such information in link_map)
And the easiest is to use some stat syscall with map->l_name - to read file size from the disk (inexact in detecting huge bss section).
Parsing /proc/self/maps (or perhaps popen-ing a pmap command) seems still the easiest thing to me. And there is also the dladdr function (provided you have some adress to start with).

Resources