Perf stack backtraces on armv7l - arm

I'm trying to use perf to get information about stack backtraces in my system.
I compile an application where main calls f, f calls g1, g1 calls g3, g3 calls g4, g4 calls g2.
I expect my backtrases to be something like
g2
g4
g3
g1
f
main
But instead, I have cropped backtraces in perf script, like
a.out 2869 [000] 19414.348571: 225426 cycles:ppp:
7ac f (/opt/usr/home/owner/a.out)
beb3dd2c [unknown] ([unknown])
a.out 2869 [000] 19414.348754: 235721 cycles:ppp:
72c g1 (/opt/usr/home/owner/a.out)
beb3dd24 [unknown] ([unknown])
a.out 2869 [000] 19414.348937: 246486 cycles:ppp:
670 g3 (/opt/usr/home/owner/a.out)
beb3dd14 [unknown] ([unknown])
a.out 2869 [000] 19414.349121: 232929 cycles:ppp:
60c g4 (/opt/usr/home/owner/a.out)
beb3dd04 [unknown] ([unknown])
How can I get more info about my backtraces?
Compile: arm-linux-gnueabi-gcc -O0 -g3 -marm -fno-omit-frame-pointer -funwind-tables main.c
Perf record: perf record -g -a
Perf script: perf script
Target is running on Linux 3.10.65.

There were problems with stack unwinding implementation in perf for ARM: Not implemented at some moment.
Try recent kernel and/or recent version of perf (new perf tool will work on old kernel, but part of backtrace reading is in kernel).
https://wiki.linaro.org/LEG/Engineering/TOOLS/perf-callstack-unwinding (also mentioned there: How does linux's perf utility understand stack traces?)
Linux perf has architecture specific support code. x86 has some dwarf stack frame unwinding support whilst arm and arm64 do not. It should be implemented on ARM32/64.
The work for ARMv7 is done under LEG-760 blueprint. .. The expected result is the backtrace of user and kernel call chain in the perf output statistics.
Support was commited after september 2013: http://www.spinics.net/lists/kernel/msg1608919.html
and 3.10 is from june 2013.
Try kernel&perf from 3.11, or any newer version of kernel.

Related

How can I check integrity of a extracted zImage?

$ binwalk -e linux_image.img
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 Android bootimg, kernel size: 6897653 bytes, kernel addr: 0x81C08000, ramdisk size: 5959520 bytes, ramdisk addr: 0x81C08000, product name: ""
2048 0x800 Linux kernel ARM boot executable zImage (little-endian)
18479 0x482F gzip compressed data, maximum compression, from Unix, last modified: 1970-01-01 00:00:00 (null date)
6761720 0x672CF8 device tree image (dtb)
6883304 0x6907E8 Unix path: /dev/block/platform/soc/7824900.sdhci/by-name/vendor
6899712 0x694800 gzip compressed data, maximum compression, has original file name: "rootfs.cpio", from Unix, last modified: 2019-04-06 00:42:26
9706949 0x941DC5 MySQL ISAM compressed data file Version 11
$ dd if=linux_image.img of=vmlinuz bs=1 skip=2048 count=6897653
$ file vmlinuz
vmlinuz: Linux kernel ARM boot executable zImage (little-endian)
$ dd if=vmlinuz bs=1 skip=$(LC_ALL=C grep -a -b -o $'\x1f\x8b\x08\x00\x00\x00\x00\x00' vmlinuz-3.18.66-perf | head -n 1 | cut -d ':' -f 1) | zcat | grep -a 'Linux version'
Linux version 3.18.66 (build#test) (gcc version 4.9.3 (GCC) ) #1 SMP PREEMPT Fri Apr 1 13:16:33 PDT 2018
Running 'qemu-system-arm.exe -machine vexpress-a9 -cpu cortex-a7 -smp 4 -kernel vmlinuz' blank screen
If you pull a random Arm Linux kernel (including Android) from somewhere and try to run it on anything other than the hardware that it is intended to boot on, the expected result is that it crashes very early in bootup without being able to output anything to screen or serial port, ie you get a black screen and nothing happens. The most likely situation here is that your image is fine and not corrupt, it's just not built to run on the vexpress-a9 board you're running it on.
In the unlikely event that this really is a kernel built for the vexpress-a9, the next problem you have is that you haven't passed QEMU a device tree blob via the -dtb option. Modern Linux kernels don't hardcode all the information about the boards they can run on, but instead expect the bootloader (which is QEMU in this case) to pass them a data file which provides information about where all the devices are for the board. If you don't do that, then the result is the same as above: kernel crashes very early in bootup without being able to output any information, so black screen.

Discrepancy in behavior of Linux loaders (ld-linux-x86-64) between Glibc 2.12 and Glibc 2.17

I'm trying to compile the same lib on two x86 separate machines.
Both use the same toolchain (exactly same set of files) but have different Glibc versions.
When I run command LD_DEBUG=libs /lib64/ld-linux-x86-64.so.2 --list ./libl2ps.so I notice the following discrepancy between the 2 Linux loaders:
Machine 1 (with Glibc 2.12):
19943: find library=libm.so.6 [0]; searching
19943: search path=/ebs/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64:...:/ebs/frperies/repo/gnb/uplane/build_bbp/l2_ps/build/. (RPATH from file ./libl2ps.so)
19943: trying file=/ebs/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64/libm.so.6
19943:
19943: find library=libgcc_s.so.1 [0]; searching
...
In this case the Linux loader selects lib libm.so.6 from the toolchain path based on RPATH of lib libl2ps.so.
Machine 2 (with Glibc 2.17):
10699: find library=libm.so.6 [0]; searching
10699: search path=/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64:/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/lib64:/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib:/home/frperies/repo/gnb/uplane/build_bbp/l2_ps/build/. (RPATH from file ./libl2ps.so)
10699: trying file=/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64/libm.so.6
10699: trying file=/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/lib64/libm.so.6
10699: trying file=/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib/libm.so.6
10699: trying file=/home/frperies/repo/gnb/uplane/build_bbp/l2_ps/build/./libm.so.6
10699: search cache=/etc/ld.so.cache
10699: trying file=/lib64/libm.so.6
As for Machine 1, the loader attempts from RPATH of libl2ps.so to select lib libm.so.6 from toolchain path but skip it for some reason and try further other paths. Finally it selects libm.so.6from the system path /lib64/.
The RPATH of the 2 libs lib2ps.so are exactly the same. The two files libm.so.6 are also exactly the same on both machines (checked with md5sum).
I don't understand this differences of behavior between the 2 Linux loaders.
Do you see any reason what would explain this discrepancy ?
Thank you very much for your answers.
Update:
Thank you yugr for your answer.
Output of readelf -h gives only differences on fields "Entry point address" and "Start of section headers" and there is no other differences so I think it will not help.
Regarding using dlopen()/dlerror(), I've done a little executable with the following statement:
dlopen("/home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64/libm-2.28.so", RTLD_LAZY);
On machine 1 it works as expected:
C++ dlopen demo
Opening libm-2.28.so...
Closing library...
On machine 2 it fails and dlerror() gives the following output:
Cannot open library: /home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64/libm-2.28.so: cannot open shared object file: No such file or directory
but the file libm-2-28.so really exists on my file system:
$ ls -l /home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic-linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64/libm-2.28.so
-rwxr-xr-x 1 frperies linseeusers_lte_espoo 1682944 Oct 5 13:50 /home/frperies/repo/gnb/uplane/build/prefix-root/asik-x86_64-ps_lfs-dynamic- linker-on/toolchain/sysroots/core2-64-pc-linux-gnu/usr/lib64/libm-2.28.so
This is very weird, what could lead to this situation ???
Thanks
Update 2:
That is true that I haven't pointed out that machine 1 is a RHEL6.8 distro while machine 2 is RHEL7.4 distro. I (naively?) didn't think this really matters...
On machine 1:
$ cat /proc/sys/kernel/osrelease
4.4.115-1.NSN.el6.x86_64
$ uname -a
Linux sq24-3 4.4.115-1.NSN.el6.x86_64 #1 SMP Mon Feb 12 12:35:46 CET 2018 x86_64 x86_64 x86_64 GNU/Linux
$ readelf -n libl2ps.so
Notes at offset 0x00000270 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: b598468830fdf2f61eda25553b9a367c4d28cdc9
On machine 2:
$ cat /proc/sys/kernel/osrelease
3.10.0-693.el7.x86_64
$ uname -a
Linux localhost.localdomain 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
$ readelf -n libl2ps.so
Displaying notes found at file offset 0x00000270 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 5829181bc0502233748149369108915ea7b10e8f
Does it help ?
Thanks
Update 3:
$ readelf -n libm.so.6
Notes at offset 0x00000238 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 0d84c7247dd76008c096719043e5592735a1c4bd
Notes at offset 0x0000025c with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 4.4.0
So, how to interpret this ABI version number set to 4.4.0 ??
Thanks
Thank you yugr and Employed Russian for your answers!!
I will give it a try by upgrading my Kernel version on Machine 2.
Thanks
Regards
The error message that you see is the infamously confusing ENOENT errno. I see two instances of it in dl-load.c:
checking OS compatibility
loading non-setuid to setuid process
I suspect the first one fails in your case which would mean that OS kernel is incompatible between two machines. ld.so manpage indeed says that
Each shared object can inform the dynamic linker of the
minimum kernel ABI version that it requires. (This
requirement is encoded in an ELF note section that is viewable
via readelf -n as a section labeled NT_GNU_ABI_TAG.) At run
time, the dynamic linker determines the ABI version of the
running kernel and will reject loading shared objects that
specify minimum ABI versions that exceed that ABI version.
NT_GNU_ABI_TAG is 4.4.0 which means that you run a program expecting a minimum 4.4 kernel on a 3.10 kernel. Theoretically newer Glibc should run on older kernels as well but in your case Glibc was probly built with explicit --enable-kernel flag which prevents it's usage on kernels before 4.4 (see e.g. this explanation of --enable-kernel).
As a workaround, you may try to fool Glibc by overriding kernel version on machine 2 via
export LD_ASSUME_KERNEL=4.4.0
but it may not work if libm makes 4.4-specific syscalls that are not really present on 3.10.

Why is lua on host system slower than in the linux vm?

Comparing executing time of this Lua Script on a Macbook Air (Mac OS 10.9.4, i5-4250U (1.3GHz), 8GB RAM) to a VM (virtualbox) running Arch Linux.
Compiling Lua 5.2.3 in a Arch Linux virtualbox
First I've compiled lua by myself using clang, to compare it with the Mac OS X clang binary.
using tcc, gcc and clang
$ tcc *[^ca].c lgc.c lfunc.c lua.c -lm -o luatcc
$ gcc -O3 *[^ca].c lgc.c lfunc.c lua.c -lm -o luagcc
/tmp/ccxAEYH8.o: In function `os_tmpname':
loslib.c:(.text+0x29c): warning: the use of `tmpnam' is dangerous, better use `mkstemp'
$ clang -O3 *[^ca].c lgc.c lfunc.c lua.c -lm -o luaclang
/tmp/loslib-bd4ef4.o:loslib.c:function os_tmpname: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
clang version in VM
$ clang --version
clang version 3.4.2 (tags/RELEASE_34/dot2-final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
compare the file size
$ ls -lh |grep lua
-rwxr-xr-x 1 markus markus 210K 20. Aug 18:21 luaclang
-rwxr-xr-x 1 markus markus 251K 20. Aug 18:22 luagcc
-rwxr-xr-x 1 markus markus 287K 20. Aug 18:22 luatcc
VM benchmarking
clang binary ~3.1 sec
$ time ./luaclang sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m3.124s
user 0m3.100s
sys 0m0.020s
gcc binary ~3.09 sec
$ time ./luagcc sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m3.090s
user 0m3.080s
sys 0m0.007s
tcc binary ~7.0 sec - no surprise here :)
$ time ./luatcc sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m7.071s
user 0m7.053s
sys 0m0.010s
Compiling on Mac OS X
Now compiling lua with the same clang command/options like in the VM.
$ clang -O3 *[^ca].c lgc.c lfunc.c lua.c -lm -o luaclangmac
loslib.c:108:3: warning: 'tmpnam' is deprecated: This function is provided for
compatibility reasons only. Due to security concerns inherent in the design of tmpnam(3),
it is highly recommended that you use mkstemp(3)
instead. [-Wdeprecated-declarations]
lua_tmpnam(buff, err);
^
loslib.c:57:33: note: expanded from macro 'lua_tmpnam'
#define lua_tmpnam(b,e) { e = (tmpnam(b) == NULL); }
^
/usr/include/stdio.h:274:7: note: 'tmpnam' declared here
char *tmpnam(char *);
^
1 warning generated.
clang version Mac OS X
I've tried two version. 3.4.2 and the one which is provided by xcode. The version 3.4.2 is a bit slower.
Markuss-MacBook-Air:bin markus$ ./clang --version
clang version 3.4.2 (tags/RELEASE_34/dot2-rc1)
Target: x86_64-apple-darwin13.3.0
Thread model: posix
Markuss-MacBook-Air:bin markus$ clang --version
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.3.0
Thread model: posix
file size
$ ls -lh|grep lua
-rwxr-xr-x 1 markus staff 194K 20 Aug 18:26 luaclangmac
HOST benchmarking
clang binary ~4.3 sec
$ time ./luaclangmac sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m4.338s
user 0m4.264s
sys 0m0.062s
Why?
I would have expected that the host system is a little faster than the virtualization (or roughly the same speed). But not that the host system is reproducible slower.
So, any ideas or explanations?
Update 2014.10.30
Meanwhile I've installed Arch Linux nativly on my MBA. The benchmarks are as fast as in the Arch Linux VM.
Can you try to run 'perf stat' instead of 'time'. It provides you much more details and the time measurement is more correct, avoiding timing differences inside the VM.
Here is an example:
$ perf stat ls > /dev/null
Performance counter stats for 'ls':
23.348076 task-clock (msec) # 0.989 CPUs utilized
2 context-switches # 0.086 K/sec
0 cpu-migrations # 0.000 K/sec
93 page-faults # 0.004 M/sec
74,628,308 cycles # 3.196 GHz [65.75%]
740,755 stalled-cycles-frontend # 0.99% frontend cycles idle [48.66%]
29,200,738 stalled-cycles-backend # 39.13% backend cycles idle [60.02%]
80,592,001 instructions # 1.08 insns per cycle
# 0.36 stalled cycles per insn
17,746,633 branches # 760.090 M/sec [60.00%]
642,360 branch-misses # 3.62% of all branches [48.64%]
0.023609439 seconds time elapsed
My guess is that the HFS+ journaling feature is adding latency. This would be easy enough to test: If TimeMachine is running on the Macbook Air, you could try disabling it, and disable journaling on the filesystem (obviously you should back up first). As root:
diskutil disableJournal YourDiskVolume
I'd see if that's the cause of the problem. Then i would immediately re-enable journaling.
diskutil enableJournal YourDiskVolume
OS X 10.9.2 had a journaling-related bug that would hang the filesystem... this page explores this bug further, and even though the bug (#15821723) hasn't been reported as fixed, journaling reportedly no longer crashes the disk controller.
to test the speed of lua, instead of reading a file hard-code some sample data into the test script and loop over the lines over and over as necessary. Like others mentioned, the filesystem effects are going to outweigh any compiler differences.

"Source file is more recent than executable" except it isn't

GDB is complaining that my source file is more recent than the executable, and it appears the debugging information is indeed related to an older version of the source file, because gdb is stopping on a blank line:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) up
#1 0x00007ffff7ba2d88 in CBKeyPairGenerate (keyPair=0x602010) at library/src/CBHDKeys.c:246
warning: Source file is more recent than executable.
246
(gdb) list
241 if (versionBytes == CB_HD_KEY_VERSION_TEST_PUBLIC
242 || versionBytes == CB_HD_KEY_VERSION_TEST_PRIVATE)
243 return CB_NETWORK_TEST;
244
245 return CB_NETWORK_UNKNOWN;
246
247 }
248
249 uint8_t * CBHDKeyGetPrivateKey(CBHDKey * key) {
250
But the executable is more recent than the source file, see here:
$ ls -l library/src/CBHDKeys.c
-rw-r--r-- 1 matt matt 9249 Apr 29 22:40 library/src/CBHDKeys.c
$ ls -l bin/noLowerAddressGenerator
-rwxr-xr-x 1 matt matt 17845 Apr 30 15:52 bin/noLowerAddressGenerator
I tried rebuilding after make clean and ccache -C but the same problem occurs. When I updated the source file I only added whitespace, so the program logic remains equal.I feel that has something to do with it, but since I cleared the ccache and cleaned the build and bin directory with make clean I'm not sure what is going on.
Versions:
GNU Make 3.81
gcc (Debian 4.8.2-16) 4.8.2
GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1)
ccache version 3.1.9
SolydXK - SMP Debian 3.13.5-1 (2014-03-04)
Perhaps you're not using the most recent compiled version of the code, if it's in a shared library. You could use ldd noLowerAddressGenerator to see the library dependencies of your program; I don't know if it's possible from within GDB to locate the relevant library, but there ought to be a way (please comment or edit if you know how).
If this is indeed the case, you might want to set environment LD_LIBRARY_PATH in GDB prior to running the program, to place your newly-built library ahead of any installed ones. You could look into setting the RPATH ELF variable when linking, but that's likely to be less help.
Another possibility is to run your debugger on a system where you know the library isn't installed. I've had good results using schroot to keep build/debug/install environments separated.

Getting user-space stack information from perf

I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.
I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.
It's easy to record block I/O requests and completions with, eg:
sudo perf record -g -T -u postgres -e 'block:block_rq_*'
and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script output like:
postgres 7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]
If I add the -g flag to perf record it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease, LWLockAcquire, memcpy (mmap'd IO), __GI___libc_write, etc.
So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.
I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.
OK, looks like there are several parts to this:
I'm on x86_64, where most distros build with -fomit-frame-pointer by default, and perf can't follow the stack without frame pointers;
.... unless it's a newer version built with libunwind support, in which case it supports perf record -g dwarf.
See:
the patch adding libunwind support to Perf
Debian bug 725075.
linux perf: how to interpret and find hotspots
I'm on Fedora 18, but the same issue applies. So if you're profiling code you're working on (as is likely on Stack Overflow), rebuild with -fno-omit-frame-pointer and -ggdb.
I landed up rebuilding perf because I wanted to be able to compare to the stock RPMs:
sudo yum build-dep perf
sudo yum install yum-utils rpmdevtools libunwind-devel
yumdownloader --source perf or download the appropriate kernel-.....src.rpm srpm
rpmdev-setuptree
rpm -Uvh kernel-*.src.rpm
cd $HOME/rpmbuild/SPECS
rpmbuild -bp --target=$(uname -m) kernel.spec
At this point you can just build a new perf if you want:
cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
make
... which I did and tested that the updated perf does in fact capture a useful stack if built with libunwind available.
You can also build a new rpm:
edit kernel.spec, uncomment the line %define buildid ..., change buildid to something like .perfunwind. Note it's %define not % define.
In the same spec file, find:
%global perf_make \
make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
and delete NO_LIBUNWIND=1
rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.spec to produce new perf RPMs without building the whole kernel. Or if you want, omit the --without for the kernel flavour you want, in which case you'll also want to build headers, debuginfo, etc.
sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm
See the fedora project guide on building a custom kernel.
I've reported the issue to Fedora; they shouldn't be using NO_LIBUNWIND=1. See bug 1025603.
Once you have a rebuilt perf you can use perf record -g dwarf to get full stacks.

Resources