I am exploring the possibility to back the text and data segment with hugepages following the guide in https://github.com/libhugetlbfs/libhugetlbfs/blob/master/HOWTO.
I have relinked the application as suggested by adding "-B/usr/share/libhugetlbfs -Wl,--hugetlbfs-align" and started the application with "hugectl --text --data --heap --bss /path/to/my/application".
But I am not very sure how to verify whether the data and text segments are indeed copied to the files on the hugetlbfs filesystem.
Checking the /proc/{pid}/maps, it could be seen that the hugepages are used for heap but not for text and data segments as the first two address ranges are mapped to the application but not the hugepage file system.
Is my understanding correct? Actually I suspect my conclusion that hugepages are used for heap from /proc/{pid}/maps is also incorrect.
How should I verify whether the data and text segments are backed in the hugepages? I know that data and text segments will be copied to hugetlbfs filesystem if successful but how to verify it?
Thanks!
output of /proc/{pid}/maps
00400000-00d2c000 r-xp 00000000 fd:02 46153351 /path/to/my/application
00f2b000-00fa3000 rw-p 0092b000 fd:02 46153351 /path/to/my/application
00fa3000-00fbb000 rw-p 00000000 00:00 0
02a0c000-02a2d000 rw-p 00000000 00:00 0 [heap]
40000000-80000000 rw-p 00000000 00:15 2476090 /dev/hugepages-1G/libhugetlbfs.tmp.nS7exn (deleted)
check
/proc/$pid/numa_maps
contains information about each memory area used by a given process allowing--among other information--the determination of which nodes were used for the pages.
for formar see http://linux.die.net/man/5/numa_maps
If you set HUGETLB_DEBUG=1 variable, it will tell you whole lot of useful information. One of them is this:
INFO: Segment 2's aligned memsz is too small: 0x864 < 0xffffffffffffffff
If it is successful it looks like:
libhugetlbfs [zupa:154297]: INFO: Segment 0 (phdr 2): 0x400000-0x400864 (filesz=0x864) (prot = 0x5)
libhugetlbfs [zupa:154297]: INFO: Segment 1 (phdr 3): 0x600868-0x600af8 (filesz=0x27c) (prot = 0x3)
libhugetlbfs [zupa:154297]: DEBUG: Total memsz = 0xaf4, memsz of largest segment = 0x864
libhugetlbfs [zupa:154297]: INFO: libhugetlbfs version: 2.16 (modified)
libhugetlbfs [zupa:154951]: INFO: Mapped hugeseg at 0x2aaaaac00000. Copying 0x864 bytes and 0 extra bytes from 0x400000...done
libhugetlbfs [zupa:154297]: INFO: Prepare succeeded
libhugetlbfs [zupa:154952]: INFO: Mapped hugeseg at 0x2aaaaac00000. Copying 0x27c bytes and 0 extra bytes from 0x600868...done
libhugetlbfs [zupa:154297]: INFO: Prepare succeeded
Related
I learned from this link Why is address 0x400000 chosen as a start of text segment in x86_64 ABI? that 64-bit Linux process start address by default should be 0x400000, but on my Ubuntu, I only found my bash process starts from a very high base address (0x55971cea6000).
Any one knows why? and how does dynamic linker choose the start address for a 64-bit process?
$ uname -r
5.15.0-25-generic
$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"
...
$ file /usr/bin/bash
/usr/bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=33a5554034feb2af38e8c75872058883b2988bc5, for GNU/Linux 3.2.0, stripped
$ ld -verbose | grep -i text-segment
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
$ cat maps
55971ce77000-55971cea6000 r--p 00000000 08:02 153 /usr/bin/bash
55971cea6000-55971cf85000 r-xp 0002f000 08:02 153 /usr/bin/bash
55971cf85000-55971cfbf000 r--p 0010e000 08:02 153 /usr/bin/bash
55971cfc0000-55971cfc4000 r--p 00148000 08:02 153 /usr/bin/bash
55971cfc4000-55971cfcd000 rw-p 0014c000 08:02 153 /usr/bin/bash
55971cfcd000-55971cfd8000 rw-p 00000000 00:00 0
...
$ readelf -h /usr/bin/bash
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x32eb0
Start of program headers: 64 (bytes into file)
Start of section headers: 1394600 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 30
Section header string table index: 29
I learned from this link Why is address 0x400000 chosen as a start of text segment in x86_64
That address is used for executables (ELF type ET_EXEC).
I only found my bash process starts from a very high base address (0x55971cea6000). Any one knows why?
Because your bash is (newer) position-independent executable (ELF type ET_DYN). It behaves much like a shared library, and is relocated to random address at runtime.
The 0x55971cea6000 address you found will vary from one execution to another. In contrast, ET_EXEC executables can only run correctly when loaded at their "linked at" address (typically 0x400000).
how does dynamic linker choose the start address for a 64-bit process?
The dynamic linker doesn't choose the start address of the executable -- the kernel does (by the time the dynamic linker starts running, the executable has already been mmaped into memory).
The kernel looks at the .e_type in the ELF header and .p_vaddr field of the first program header and goes from there. IFF .e_type == ET_EXEC, then the kernel maps executable segments at their .p_vaddr addresses. For ET_DYN, if ASLR is in effect, the kernel performs mmaps at a random address.
I'm trying to write an executable where the .text section is located in a specific location. I wrote the following linker script:
base_address = 0x123456789AB;
SECTIONS
{
ENTRY(_start)
. = base_address;
.text : { *(.text); }
}
However when I look at the memory mapping of the resulting executable, I see:
12345600000-12345679000 r-xp 00000000 08:01 1 /path/to/executable
It is obvious why the section would be aligned to 0x1000, that's page size, but where does 0x100000 come from?
Now the interesting part is that this only happens when I disable build ids with -Wl,--build-id=none, or when I enable them and try to stuff the .note.gnu.build-id section right after .text.
Have a look at the generated map file. It's possible that some of your input .text sections also have their own alignment requirements.
My problem is the same as this question and this question
I basicly want trying to run httperf with 10000 connection in parallel like this [httperf --uri / --server 192.168.1.2 --port 8080 --num-conns=500000 --rate 10000]
I'm running it on Ubuntu 14.04.
First I raised the system file descriptor limit, this is what is configured in my SO now:
$ ulimit -a -S
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31348
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65530
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31348
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ulimit -a -H
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31348
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65530
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 31348
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I tried to compile the HEAD version from github repository, but it seems like completly unstable.
I try also the 0.9.0 version modified limit(changed /usr/include/x86_64-linux-gnu/bits/typesizes.h to unlock the FD_SETSIZE 1024) like others questions answers suggest to do. After recompile the httperf it keeps returning the same error:
*** buffer overflow detected ***: ./httperf terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x73f1f)[0x7fdca440ef1f]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7fdca44a682c]
/lib/x86_64-linux-gnu/libc.so.6(+0x10a6f0)[0x7fdca44a56f0]
/lib/x86_64-linux-gnu/libc.so.6(+0x10b777)[0x7fdca44a6777]
./httperf[0x403c69]
./httperf[0x4047e7]
./httperf[0x4088df]
./httperf[0x408d2e]
./httperf[0x4071df]
./httperf[0x40730b]
./httperf[0x406791]
./httperf[0x405e0e]
./httperf[0x409afd]
./httperf[0x406022]
./httperf[0x404c1f]
./httperf[0x4024ac]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fdca43bcec5]
./httperf[0x40358b]
======= Memory map: ========
00400000-00410000 r-xp 00000000 08:05 265276
0060f000-00610000 r--p 0000f000 08:05 265276
00610000-00611000 rw-p 00010000 08:05 265276
00611000-0068a000 rw-p 00000000 00:00 0
019da000-01c8f000 rw-p 00000000 00:00 0 [heap]
7fdca4185000-7fdca419b000 r-xp 00000000 08:06 3277773 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fdca419b000-7fdca439a000 ---p 00016000 08:06 3277773 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fdca439a000-7fdca439b000 rw-p 00015000 08:06 3277773 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fdca439b000-7fdca4556000 r-xp 00000000 08:06 3279540 /lib/x86_64-linux-gnu/libc-2.19.so
7fdca4556000-7fdca4756000 ---p 001bb000 08:06 3279540 /lib/x86_64-linux-gnu/libc-2.19.so
7fdca4756000-7fdca475a000 r--p 001bb000 08:06 3279540 /lib/x86_64-linux-gnu/libc-2.19.so
7fdca475a000-7fdca475c000 rw-p 001bf000 08:06 3279540 /lib/x86_64-linux-gnu/libc-2.19.so
7fdca475c000-7fdca4761000 rw-p 00000000 00:00 0
7fdca4761000-7fdca4866000 r-xp 00000000 08:06 3279556 /lib/x86_64-linux-gnu/libm-2.19.so
7fdca4866000-7fdca4a65000 ---p 00105000 08:06 3279556 /lib/x86_64-linux-gnu/libm-2.19.so
7fdca4a65000-7fdca4a66000 r--p 00104000 08:06 3279556 /lib/x86_64-linux-gnu/libm-2.19.so
7fdca4a66000-7fdca4a67000 rw-p 00105000 08:06 3279556 /lib/x86_64-linux-gnu/libm-2.19.so
7fdca4a67000-7fdca4a8a000 r-xp 00000000 08:06 3279536 /lib/x86_64-linux-gnu/ld-2.19.so
7fdca4c63000-7fdca4c66000 rw-p 00000000 00:00 0
7fdca4c85000-7fdca4c89000 rw-p 00000000 00:00 0
7fdca4c89000-7fdca4c8a000 r--p 00022000 08:06 3279536 /lib/x86_64-linux-gnu/ld-2.19.so
7fdca4c8a000-7fdca4c8b000 rw-p 00023000 08:06 3279536 /lib/x86_64-linux-gnu/ld-2.19.so
7fdca4c8b000-7fdca4c8c000 rw-p 00000000 00:00 0
7ffff050b000-7ffff052c000 rw-p 00000000 00:00 0 [stack]
7ffff05fe000-7ffff0600000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
I'm not that familliar with low level syscall such as select, but as far as I can tell htperf 0.9.0 use select to handle socket events and this syscall is limited by a hardcoded 1024 size of file descriptor limit.
So you guys have any idea what am I doing wrong? How can I unlock the 1024 limit?
You may not want to use 10K descriptors in a single process. If you decide to do it, you will probably want to split the handling up so that a single call to select() is not handling all 10K descriptors (or performance will, to use a descriptive technical term, suck). See Wikipedia on the C10K Problem or the SO c10k tag — which this question is already tagged with, so you are at least aware of the classification.
You need to look at ulimit -a -H vs ulimit -a -S to see how much of various resources you have (or replace -a with -n to get 'open files' aka 'file descriptors'). If you have a hard limit less than 10K, you are into kernel recompilation, or at least finding the source of that upper limit in the configuration. If the hard limit is bigger, you can override the limit with ulimit at the command line, or with the POSIX getrlimit()
and setrlimit() functions and RLIMIT_NOFILE.
I have an MPI application with which combines both C and Fortran sources. Occasionally it crashes due to a memory related bug, but I am having trouble finding the bug (it is somewhere in someone else's code, which at the moment I'm not very familiar with). I haven't yet been able to catch it with gdb, but sometimes a glibc backtrace is output as shown below.
The bug is probably close to "(main_main_+0x3bca)[0x804d5ce]", (but with a memory error, I know this may not be the case). My question is, does anyone know how to convert +0x3bca or 0x804d5ce into a particular line of the code?
Any other suggestions on tracking down the bug would also be appreciated. I'm quite familiar with the basics of gdb.
*** glibc detected *** /home/.../src/finite_element: munmap_chunk(): invalid pointer: 0x09d83018 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x73e42)[0xb7409e42]
/lib/i386-linux-gnu/libc.so.6(+0x74525)[0xb740a525]
/home/.../src/finite_element(main_main_+0x3bca)[0x804d5ce]
/home/.../src/finite_element[0x804e195]
/home/.../src/finite_element(main+0x34)[0x804e1e8]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb73af4d3]
/home/davepc/finite-element/src/finite_element[0x8049971]
======= Memory map: ========
08048000-08056000 r-xp 00000000 08:05 1346306 /home/.../src/finite_element
08056000-08057000 r--p 0000d000 08:05 1346306 /home/.../src/finite_element
08057000-08058000 rw-p 0000e000 08:05 1346306 /home/.../src/finite_element
09d1b000-09d8f000 rw-p 00000000 00:00 0 [heap]
b2999000-b699b000 rw-s 00000000 08:03 15855 /tmp/openmpi-sessions-_0/37612/1/shared_mem_pool.babel
b699b000-b6b1d000 rw-p 00000000 00:00 0
b6b31000-b6b3d000 r-xp 00000000 08:03 407798 /usr/lib/openmpi/lib/openmpi/mca_osc_rdma.so
b6b3d000-b6b3e000 r--p 0000b000 08:03 407798 /usr/lib/openmpi/lib/openmpi/mca_osc_rdma.so
b6b3e000-b6b3f000 rw-p 0000c000 08:03 407798 /usr/lib/openmpi/lib/openmpi/mca_osc_rdma.so
<snip>
Thank you...
If you are in gdb and you have debugging symbols, it is quite easy. Use list.
(gdb) list *0x804d5ce
This should give you the line of code, and show you the source if it is able to find the source file.
Without gdb you could try to use addr2line:
$ addr2line -e finite_element 0x804d5ce
When reading the /proc/$PID/maps you get the mapped memory regions.
Is ther a way to dump one of this regions?
$ cat /proc/18448/maps
...[snip]...
0059e000-005b1000 r-xp 00000000 08:11 40 /usr/local/lib/libgstlightning.so.0.0.0
005b1000-005b2000 r--p 00012000 08:11 40 /usr/local/lib/libgstlightning.so.0.0.0
005b2000-005b3000 rw-p 00013000 08:11 40 /usr/local/lib/libgstlightning.so.0.0.0
...[snip]...
Thanks
Nah! Call ptrace() with PTRACE ATTACH. Then open /proc/<pid>/mem, seek to the region offset, and read the length of the region as given in /proc</pid>/maps.
Here's a program I wrote that does it in C. Here's a module I wrote that does it in Python (and the ptrace binding). For the finish, a program that dumps all regions of a process to files.
Enjoy!
You can attach gdb to the process then dump memory region of length X words starting at location L with this: x/Xw L.
Attaching gdb when you start your process is simple: gdb ./executable then run. If you need to attach to a running process, start gdb then gdb attach pid where pid is is the process ID you care about.
Using dd(1):
sudo dd if=/dev/mem bs=1 skip=$(( 16#0059e000 - 1 )) \
count=$(( 16#005b1000 - 16#0059e000 + 1)) | hexdump -C