unwind_frame cause a kernel paging error - c

Background
Found a strange kernel Oops, Googled a lot, found nothing.
Background:
The kernel version is 3.0.8
There are two process let's say p1, p2
p2 have lots of threads(about 30)
p1 continuously calls system(pidof("name of p1"))
The kernel may Oops after running for a few days. the primary reason I found is that unwind_frame got a strange frame->fp(0xFFFFFFFF) from get_wchan
When executing this line
frame->fp = *(unsigned long *)(fp - 12);
The CPU will try to access 0xFFFFFFF3, and cause a paging error.
My question is:
How on earth the fp register saved before context switch becomes 0xFFFFFFFF ?
here is the CPU infomation
# cat /proc/cpuinfo
Processor : ARMv7 Processor rev 0 (v7l)
processor : 0
BogoMIPS : 1849.75
processor : 1
BogoMIPS : 1856.30
Features : swp half thumb fastmult vfp edsp vfpv3 vfpv3d16
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
Here is the Oops and pt registers:
[734212.113136] Unable to handle kernel paging request at virtual address fffffff3
[734212.113154] pgd = 826f0000
[734212.113175] [fffffff3] *pgd=8cdfe821, *pte=00000000, *ppte=00000000
[734212.113199] Internal error: Oops: 17 [#1] SMP
--------------cut--------------
[734212.113464] CPU: 1 Tainted: P (3.0.8 #2)
[734212.113523] PC is at unwind_frame+0x48/0x68
[734212.113538] LR is at get_wchan+0x8c/0x298
[734212.113557] pc : [<8003d120>] lr : [<8003a660>] psr: a0000013
[734212.113561] sp : 845d1cc8 ip : 00000003 fp : 845d1cd4
[734212.113583] r10: 00000001 r9 : 00000000 r8 : 80493c34
[734212.113597] r7 : 00000000 r6 : 00000000 r5 : 83354960 r4 : 845d1cd8
[734212.113613] r3 : 845d1cd8 r2 : ffffffff r1 : 80490000 r0 : 8049003f
[734212.113632] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[734212.113651] Control: 10c53c7d Table: 826f004a DAC: 00000015
Here is the callstack:
[734212.117027] Backtrace:
[734212.117052] [<8003d0d8>] (unwind_frame+0x0/0x68) from [<8003a660>] (get_wchan+0x8c/0x298)
[734212.117079] [<8003a5d4>] (get_wchan+0x0/0x298) from [<8011f700>] (do_task_stat+0x548/0x5ec)
[734212.117099] r4:00000000
[734212.117118] [<8011f1b8>] (do_task_stat+0x0/0x5ec) from [<8011f7c0>] (proc_tgid_stat+0x1c/0x24)
[734212.117158] [<8011f7a4>] (proc_tgid_stat+0x0/0x24) from [<8011b7f0>] (proc_single_show+0x54/0x98)
[734212.117196] [<8011b79c>] (proc_single_show+0x0/0x98) from [<800e9024>] (seq_read+0x1b4/0x4e4)
[734212.117215] r8:845d1f08 r7:845d1f70 r6:00000001 r5:8ca89d20 r4:866ea540
[734212.117237] r3:00000000
[734212.117264] [<800e8e70>] (seq_read+0x0/0x4e4) from [<800c8c54>] (vfs_read+0xb4/0x19c)
[734212.117289] [<800c8ba0>] (vfs_read+0x0/0x19c) from [<800c8e18>] (sys_read+0x44/0x74)
[734212.117307] r8:00000000 r7:00000003 r6:000003ff r5:7ea00818 r4:8ca89d20
[734212.117340] [<800c8dd4>] (sys_read+0x0/0x74) from [<800393c0>] (ret_fast_syscall+0x0/0x30)
[734212.117358] r9:845d0000 r8:80039568 r6:7ea00c90 r5:0000000e r4:7ea00818
[734212.117388] Code: e3c10d7f e3c0103f e151000c 9afffff6 (e512100c)
[734212.113136] Unable to handle kernel paging request at virtual address fffffff3
[734212.113154] pgd = 826f0000
[734212.113175] [fffffff3] *pgd=8cdfe821, *pte=00000000, *ppte=00000000
[734212.113199] Internal error: Oops: 17 [#1] SMP

This bug was fixed by Konstantin Khlebnikov, details can be found in git commit log.

Related

Allwinner a64 - switch from aarch32 to aarch64 by warm reset

I want to deploy a simple bare metal software on the Pine64 board, hosting Allwinner A64 SoC. The configuration is following: when powered on, boot0 starts u-boot, which loads my hello.bin to RAM (0x40000000) and starts executing it. The thing is that it is in aarch32 execution state and I want aarch64.
I have found out a way how to do it as in this patch. Some background also on the wiki.
I have copied the code and the objdump -d hello.o returns identical results as in the link:
Disassembly of section .text:
00000000 <_reset>:
0: e59f0024 ldr r0, [pc, #36] ; 2c <_reset+0x2c>
4: e59f1024 ldr r1, [pc, #36] ; 30 <_reset+0x30>
8: e5801000 str r1, [r0]
c: f57ff04f dsb sy
10: f57ff06f isb sy
14: ee1c0f50 mrc 15, 0, r0, cr12, cr0, {2}
18: e3800003 orr r0, r0, #3
1c: ee0c0f50 mcr 15, 0, r0, cr12, cr0, {2}
20: f57ff06f isb sy
24: e320f003 wfi
28: eafffffe b 28 <_reset+0x28>
2c: 017000a0 .word 0x017000a0
30: 40008000 .word 0x40008000
It is supposed to perform a warm-reset and start executin at 0x40008000 in aarch64 execution state. But when running I am getting Undefined instruction error and it restarts in the same state and starts from 0x0.
## Starting application at 0x40000000 ...
undefined instruction
pc : [<40000018>] lr : [<7ff1d054>]
sp : 76eb8a90 ip : 00000030 fp : 7ff1d00c
r10: 00000002 r9 : 76ed0ea0 r8 : 7ffb5340
r7 : 77f1bd78 r6 : 40000000 r5 : 00000002 r4 : 77f1bd7c
r3 : 40000000 r2 : 77f1bd7c r1 : 40008000 r0 : 017000a0
Flags: nZCv IRQs on FIQs off Mode SVC_32
Resetting CPU ...
Why is that?
EDIT:
The first problem was noticed by #Frant below, the binary that should be linked with different .text section address, that is start from 0x40000000 instead of 0x0.
It also couldn't work loaded by u-boot, that is in EL2. In order to write to RMR one needs to be in EL3. This is possible with FEL method.
NOTE:
After facing this problem I was asking around for some help and apparently I was using an old way of flashing the board. Since some time Pine64 got much better support and now it is possible to boot it in two more convenient ways:
* mainline u-boot with atf, that will directly generate a binary one can flash to SD card, and drops you in EL2,
* using the sunxi-fel tool, as described below, which is very convenient if one does not want to re-flash SD card all the time, drops you in EL3 (WARNING: sunxi wiki is a bit misleading on the sunxi-fel command arguments, these one below worked for me).
My answer is an attempt to answer the following question: Does the aarch32 state-switching code you are using work ? The good new is that the code you are using works fine. The bad new is that something else may not work properly in your environment.This would not surprise me much given the terrible state of all Allwinner out-of-the box BSPs.
Since I did not know which exact versions of boot0 and u-boot you were using, I tested your code using Andre Przywara's FEL-capable SPL binaries for A64/H5 - see the FEL Booting section of the A64 entry for more details - and sunxi-fel:This does remove the boot0 and u-boot you are using as potential culprits.
The Minimal, Complete, and Verifiable example I built for testing your code requires:
Removing the SD card from the Pine64, so that it will enter the FEL mode at power-up,
A male-A to male-A USB 2.0 cable for connecting your PC to the upper USB host receptacle of the Pine64.
A bash script, build.sh, for building sunxi-tools, retrieving the FEL-capable SPL binaries,
rmr_switch.S, a version of rmr_switch.S minus comments plus a symbol to be pre-processed for setting the start address without having to modify the file all the time,
rmr_switch2.S, a version of the rmr_switch.S mentionned above, but using r0 and r1 the way they are being used in the patch you were referencing.
uart-aarch32.s, an aarch32 program displaying *** Hello from aarch32! *** on UART0,
uart-aarch64.s, an aarch64 program displaying *** Hello from aarch64! *** on UART0.
Here is the content for each of the required files:
build.sh:
#!/bin/bash
# usage:
# CROSS_COMPILE_AARCH64=/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-elf/bin/aarch64-elf- CROSS_COMPILE_AARCH32=/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_arm-eabi/bin/arm-eabi- ./build.sh
clear
CROSS_COMPILE_AARCH64=${CROSS_COMPILE_AARCH64:-/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-elf/bin/aarch64-elf-}
CROSS_COMPILE_AARCH32=${CROSS_COMPILE_AARCH32:-/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_arm-eabi/bin/arm-eabi-}
SOC=${SOC:-a64}
#AARCH32_START_ADDRESS=0x42000000
#AARCH64_START_ADDRESS=0x42010000
AARCH32_START_ADDRESS=0x40000000
AARCH64_START_ADDRESS=0x40008000
SUNXI_FEL=sunxi-tools/sunxi-fel
install_sunxi_tools()
{
if [ ! -f ${SUNXI_FEL} ]
then
git clone --branch v1.4.2 https://github.com/linux-sunxi/sunxi-tools
pushd sunxi-tools
make
popd
fi
}
retrieve_spl_aarch32()
{
if [ ! -f sunxi-a64-spl32-ddr3.bin ]
then
wget https://github.com/apritzel/pine64/raw/master/binaries/sunxi-a64-spl32-ddr3.bin
fi
if [ ! -f sunxi-h5-spl32-ddr3.bin ]
then
wget https://github.com/apritzel/pine64/raw/master/binaries/sunxi-h5-spl32-ddr3.bin
fi
}
test_aarch32()
{
# testing aarch32 program
PROGRAM=uart-aarch32.s
BASE=${PROGRAM%%.*}
${CROSS_COMPILE_AARCH32}gcc -O0 -nostdlib -nostartfiles -e ${AARCH64_START_ADDRESS} -Wl,-Ttext=${AARCH32_START_ADDRESS} -o ${BASE}.elf ${BASE}.s
${CROSS_COMPILE_AARCH32}objcopy --remove-section .note.gnu.build-id ${BASE}.elf
${CROSS_COMPILE_AARCH32}objcopy --remove-section .ARM.attributes ${BASE}.elf
${CROSS_COMPILE_AARCH32}objdump -D ${BASE}.elf > ${BASE}.lst
${CROSS_COMPILE_AARCH32}objcopy -O binary ${BASE}.elf ${BASE}.bin
${CROSS_COMPILE_AARCH32}objcopy ${BASE}.elf -O srec ${BASE}.srec
echo "------------------ test uart-aarch32 -----------------------------"
echo sudo ${SUNXI_FEL} spl sunxi-${SOC}-spl32-ddr3.bin
echo sudo ${SUNXI_FEL} write ${AARCH32_START_ADDRESS} uart-aarch32.bin
echo sudo ${SUNXI_FEL} exe ${AARCH32_START_ADDRESS}
echo "------------------------------------------------------------------"
}
test_aarch64()
{
# testing aarch64 program
PROGRAM=uart-aarch64.s
BASE=${PROGRAM%%.*}
${CROSS_COMPILE_AARCH64}gcc -O0 -nostdlib -nostartfiles -e ${AARCH64_START_ADDRESS} -Wl,-Ttext=${AARCH64_START_ADDRESS} -o ${BASE}.elf ${BASE}.s
${CROSS_COMPILE_AARCH64}objcopy --remove-section .note.gnu.build-id ${BASE}.elf
${CROSS_COMPILE_AARCH64}objcopy --remove-section .ARM.attributes ${BASE}.elf
${CROSS_COMPILE_AARCH64}objdump -D ${BASE}.elf > ${BASE}.lst
${CROSS_COMPILE_AARCH64}objcopy -O binary ${BASE}.elf ${BASE}.bin
${CROSS_COMPILE_AARCH64}objcopy ${BASE}.elf -O srec ${BASE}.srec
echo "------------------ test uart-aarch64 -----------------------------"
echo sudo ${SUNXI_FEL} spl sunxi-${SOC}-spl32-ddr3.bin
echo sudo ${SUNXI_FEL} write ${AARCH64_START_ADDRESS} uart-aarch64.bin
echo sudo ${SUNXI_FEL} reset64 ${AARCH64_START_ADDRESS}
echo "------------------------------------------------------------------"
}
test_rmr_switch()
{
# compiling rmr_switch.s
PROGRAM=rmr_switch.s
BASE=${PROGRAM%%.*}
rm -f ${BASE}.s
${CROSS_COMPILE_AARCH64}cpp -DAARCH64_START_ADDRESS=${AARCH64_START_ADDRESS} ${BASE}.S > ${BASE}.s
${CROSS_COMPILE_AARCH32}gcc -O0 -nostdlib -nostartfiles -e ${AARCH32_START_ADDRESS} -Wl,-Ttext=${AARCH32_START_ADDRESS} -o ${BASE}.elf ${BASE}.s
${CROSS_COMPILE_AARCH32}objcopy --remove-section .note.gnu.build-id ${BASE}.elf
${CROSS_COMPILE_AARCH32}objcopy --remove-section .ARM.attributes ${BASE}.elf
${CROSS_COMPILE_AARCH32}objdump -D ${BASE}.elf > ${BASE}.lst
${CROSS_COMPILE_AARCH32}objcopy -O binary ${BASE}.elf ${BASE}.bin
${CROSS_COMPILE_AARCH32}objcopy ${BASE}.elf -O srec ${BASE}.srec
echo "------------------ test rmr_switch uart-aarch64 ------------------"
echo sudo ${SUNXI_FEL} spl sunxi-${SOC}-spl32-ddr3.bin
echo sudo ${SUNXI_FEL} write ${AARCH32_START_ADDRESS} rmr_switch.bin
echo sudo ${SUNXI_FEL} write ${AARCH64_START_ADDRESS} uart-aarch64.bin
echo sudo ${SUNXI_FEL} exe ${AARCH32_START_ADDRESS}
echo "------------------------------------------------------------------"
}
test_rmr_switch2()
{
# compiling rmr_switch2.s
PROGRAM=rmr_switch2.s
BASE=${PROGRAM%%.*}
rm -f ${BASE}.s
${CROSS_COMPILE_AARCH64}cpp -DAARCH64_START_ADDRESS=${AARCH64_START_ADDRESS} ${BASE}.S > ${BASE}.s
${CROSS_COMPILE_AARCH32}gcc -O0 -nostdlib -nostartfiles -e ${AARCH32_START_ADDRESS} -Wl,-Ttext=${AARCH32_START_ADDRESS} -o ${BASE}.elf ${BASE}.s
${CROSS_COMPILE_AARCH32}objcopy --remove-section .note.gnu.build-id ${BASE}.elf
${CROSS_COMPILE_AARCH32}objcopy --remove-section .ARM.attributes ${BASE}.elf
${CROSS_COMPILE_AARCH32}objdump -D ${BASE}.elf > ${BASE}.lst
${CROSS_COMPILE_AARCH32}objcopy -O binary ${BASE}.elf ${BASE}.bin
${CROSS_COMPILE_AARCH32}objcopy ${BASE}.elf -O srec ${BASE}.srec
echo "------------------ test rmr_switch2 uart-aarch64 -----------------"
echo sudo ${SUNXI_FEL} spl sunxi-${SOC}-spl32-ddr3.bin
echo sudo ${SUNXI_FEL} write ${AARCH32_START_ADDRESS} rmr_switch2.bin
echo sudo ${SUNXI_FEL} write ${AARCH64_START_ADDRESS} uart-aarch64.bin
echo sudo ${SUNXI_FEL} exe ${AARCH32_START_ADDRESS}
echo "------------------------------------------------------------------"
}
# prerequisites
install_sunxi_tools
retrieve_spl_aarch32
# test
test_aarch32
test_aarch64
test_rmr_switch
test_rmr_switch2
rmr_switch.S:
.text
ldr r1, =0x017000a0 # MMIO mapped RVBAR[0] register
ldr r0, =AARCH64_START_ADDRESS # start address, to be replaced
str r0, [r1]
dsb sy
isb sy
mrc 15, 0, r0, cr12, cr0, 2 # read RMR register
orr r0, r0, #3 # request reset in AArch64
mcr 15, 0, r0, cr12, cr0, 2 # write RMR register
isb sy
1: wfi
b 1b
rmr_switch2.S:
.text
ldr r0, =0x017000a0 # MMIO mapped RVBAR[0] register
ldr r1, =AARCH64_START_ADDRESS # start address, to be replaced
str r1, [r0]
dsb sy
isb sy
mrc 15, 0, r0, cr12, cr0, 2 # read RMR register
orr r0, r0, #3 # request reset in AArch64
mcr 15, 0, r0, cr12, cr0, 2 # write RMR register
isb sy
1: wfi
b 1b
uart-aarch32.s:
.code 32
.text
ldr r1,=0x01C28000
ldr r2,=message
loop: ldrb r0, [r2]
add r2, r2, #1
cmp r0, #0
beq completed
strb r0, [r1]
b loop
completed: b .
.data
message:
.asciz "*** Hello from aarch32! ***"
.end
uart-aarch64.s:
.text
ldr x1,=0x01C28000
ldr x2,=message
loop: ldrb w0, [x2]
add x2, x2, #1
cmp w0, #0
beq completed
strb w0, [x1]
b loop
completed: b .
.data
message:
.asciz "*** Hello from aarch64! ***"
.end
Once all the files are in the same directory, the test procedure would be:
Execute build.sh: You can specify the SOC you are using A64 (default) or H5, and the aarch32/aarch64 toolchains in the command-line:
CROSS_COMPILE_AARCH64=/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-elf/bin/aarch64-elf- CROSS_COMPILE_AARCH32=/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_arm-eabi/bin/arm-eabi- ./build.sh
The output should look like this, (I removed harmless warnings):
------------------ test uart-aarch32 -----------------------------
sudo sunxi-tools/sunxi-fel spl sunxi-a64-spl32-ddr3.bin
sudo sunxi-tools/sunxi-fel write 0x40000000 uart-aarch32.bin
sudo sunxi-tools/sunxi-fel exe 0x40000000
------------------ test uart-aarch64 -----------------------------
sudo sunxi-tools/sunxi-fel spl sunxi-a64-spl32-ddr3.bin
sudo sunxi-tools/sunxi-fel write 0x40008000 uart-aarch64.bin
sudo sunxi-tools/sunxi-fel reset64 0x40008000
------------------ test rmr_switch uart-aarch64 ------------------
sudo sunxi-tools/sunxi-fel spl sunxi-a64-spl32-ddr3.bin
sudo sunxi-tools/sunxi-fel write 0x40000000 rmr_switch.bin
sudo sunxi-tools/sunxi-fel write 0x40008000 uart-aarch64.bin
sudo sunxi-tools/sunxi-fel exe 0x40000000
------------------ test rmr_switch2 uart-aarch64 -----------------
sudo sunxi-tools/sunxi-fel spl sunxi-a64-spl32-ddr3.bin
sudo sunxi-tools/sunxi-fel write 0x40000000 rmr_switch2.bin
sudo sunxi-tools/sunxi-fel write 0x40008000 uart-aarch64.bin
sudo sunxi-tools/sunxi-fel exe 0x40000000
------------------------------------------------------------------
Now, before entering the sunxi-fel commands required for each of the four tests, you need to unplug the Pine64 from its power source and from any USB host receptacle it may be plugged into (USB TTL uart, male-A to male-A USB cable). Reconnect the Pine64 to its power source, then re-plug USB cables.
lsusb should now display:
Bus 001 Device 016: ID 1f3a:efe8 Onda (unverified) V972 tablet in flashing mode
Output on the serial console for the four tests should be:
test uart-aarch32 (verifying an aarch32 program runs from 0x40000000):
U-Boot SPL 2018.01-00007-gdb0ecc9b42 (Feb 23 2018 - 00:50:52)
DRAM: 512 MiB
Trying to boot from FEL
*** Hello from aarch32! ***
test uart-aarch64 (verifying an aarch64 program runs from 0x40008000):
U-Boot SPL 2018.01-00007-gdb0ecc9b42 (Feb 23 2018 - 00:50:52)
DRAM: 512 MiB
Trying to boot from FEL
*** Hello from aarch64! ***
test test rmr_switch uart-aarch64 (running rmr_switch from 0x40000000, which will switch into aarch64 state and execute uart-aarch64 from 0x40008000):
U-Boot SPL 2018.01-00007-gdb0ecc9b42 (Feb 23 2018 - 00:50:52)
DRAM: 512 MiB
Trying to boot from FEL
*** Hello from aarch64! ***
test test rmr_switch2 uart-aarch64 (running rmr_switch2 from 0x40000000, which will switch into aarch64 state and execute uart-aarch64 from 0x40008000):
U-Boot SPL 2018.01-00007-gdb0ecc9b42 (Feb 23 2018 - 00:50:52)
DRAM: 512 MiB
Trying to boot from FEL
*** Hello from aarch64! ***
It is worth mentioning that those tests can be performed on Windows using Linaro mingw32 toolchains, a Windows version of sunxi-fel, and Zadig.
Bottom line, the code you were using seems to be working well, and the rmr_switch2.s code I assembled is the same (I guess) that the one you are using:
rmr_switch2.elf: file format elf32-littlearm
Disassembly of section .text:
40000000 <.text>:
40000000: e59f0024 ldr r0, [pc, #36] ; 4000002c <.text+0x2c>
40000004: e59f1024 ldr r1, [pc, #36] ; 40000030 <.text+0x30>
40000008: e5801000 str r1, [r0]
4000000c: f57ff04f dsb sy
40000010: f57ff06f isb sy
40000014: ee1c0f50 mrc 15, 0, r0, cr12, cr0, {2}
40000018: e3800003 orr r0, r0, #3
4000001c: ee0c0f50 mcr 15, 0, r0, cr12, cr0, {2}
40000020: f57ff06f isb sy
40000024: e320f003 wfi
40000028: eafffffd b 40000024 <.text+0x24>
4000002c: 017000a0 cmneq r0, r0, lsr #1
40000030: 40008000 andmi r8, r0, r0
The examples were was successfully tested on an H5-based OrangePI PC2. Command-line for running build.sh should be:
SOC=h5 CROSS_COMPILE_AARCH64=/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-elf/bin/aarch64-elf- CROSS_COMPILE_AARCH32=/opt/linaro/gcc-linaro-7.2.1-2017.11-x86_64_arm-eabi/bin/arm-eabi- ./build.sh
Output for build.sh, and therefore sunxi-fel commands to be executed, will be slightly different, since a different, H5-specific, FEL-capable SPL will have to be used.
I noticed there is a small difference between the code you are using and rmr_switch2 code, but since it comes after the state switch/after wfi, it should not matter I guess - I am assuming the code you assembled was slightly different itself:
Yours (.o):
28: eafffffe b 28 <_reset+0x28>
Mine (.elf):
40000028: eafffffd b 40000024 <.text+0x24>
I hope this help.

Information about Section Index field (st_shndx) in Section SHT_DYNSYM & SHT_SYMTAB

Toolchain:
Product: ARM Compiler 5.04
Component: ARM Compiler 5.04 update 1 (build 49)
From ELF Portable Formats Specification, Version 1.1,
Section Index field (st_shndx) contains:
Every symbol table entry is ‘‘defined’’ in relation to some section; this member holds the relevant section header table index.
I am compiling a simple object (exports one global data & routine) to understand various fields of ELF
Source code (test.c):
__declspec(dllexport) int x21 = 0x100;
__declspec(dllexport) void bar21(void)
{
x21++;
}
Build script used (build.bat)
armcc -c test.c
armlink --bpabi --dll -o test.dll test.o
fromelf -cdrsy -o test.txt test.dll
My query is about usage of st_shndx field of DYNSYM section & SYMTAB section.
Output file
(Removed few sections to keep it short)
** Section #1 'ER_RO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 24 bytes (alignment 4)
Address: 0x00008000
$a
.text
bar21
0x00008000: e59f000c .... LDR r0,[pc,#12] ; [0x8014] = 0
0x00008004: e5901000 .... LDR r1,[r0,#0]
0x00008008: e2811001 .... ADD r1,r1,#1
0x0000800c: e5801000 .... STR r1,[r0,#0]
0x00008010: e12fff1e ../. BX lr
$d
0x00008014: 00000000 .... DCD 0
** Section #2 'ER_RW' (SHT_PROGBITS) [SHF_ALLOC + SHF_WRITE]
Size : 4 bytes (alignment 4)
Address: 0x00000000
0x000000: 00 01 00 00 ....
** Section #3 '.dynstr' (SHT_STRTAB)
Size : 32 bytes
** Section #4 '.dynsym' (SHT_DYNSYM)
Size : 80 bytes (alignment 4)
String table #3 '.dynstr'
Last local symbol no. 1
Symbol table .dynsym (4 symbols, 1 local)
# Symbol Name Value Bind Sec Type Vis Size
========================================================================
1 .data 0x00000000 Lc 1 Sect De 0x4
2 shared_2.dll 0x00000000 Gb Abs Data De
3 x21 0x00000000 Gb 1 Data Pr 0x4
4 bar21 0x00008000 Gb 2 Code Pr 0x14
** Section #5 '.hash' (SHT_HASH)
Size : 40 bytes (alignment 4)
Symbol table #4 '.dynsym'
<Section Truncated>
** Section #7 '.version' (SHT_GNU_versym)
Size : 10 bytes (alignment 4)
Symbol table #4 '.dynsym'
<Section Truncated>
** Section #8 '.version_d' (SHT_GNU_verdef)
Size : 56 bytes (alignment 4)
String table #3 '.dynstr'
<Section Truncated>
** Section #9 '.dynamic' (SHT_DYNAMIC)
Size : 120 bytes (alignment 4)
String table #3 '.dynstr'
<Section Truncated>
** Section #10 '.debug_frame' (SHT_PROGBITS)
Size : 68 bytes
** Section #11 '.symtab' (SHT_SYMTAB)
Size : 176 bytes (alignment 4)
String table #12 '.strtab'
Last local symbol no. 6
Symbol table .symtab (10 symbols, 6 local)
# Symbol Name Value Bind Sec Type Vis Size
========================================================================
1 $a 0x00008000 Lc 1 -- De
2 $d 0x00008014 Lc 1 -- De
3 $d.realdata 0x00000000 Lc 2 -- De
4 shared_2.c 0x00000000 Lc Abs File De
5 .text 0x00008000 Lc 1 Sect De
6 .data 0x00000000 Lc 2 Sect De 0x4
7 BuildAttributes$$ARM_ISAv4$S$PE$A:L22$X:L11$S22$IEEE1$~IW$USESV6$~STKCKD$USESV7$~SHL$OSPACE$EBA8$STANDARDLIB$REQ8$PRES8$EABIv2
0x00000000 Gb Abs -- Hi
8 shared_2.dll 0x00000000 Gb Abs Data De
9 x21 0x00000000 Gb 2 Data Pr 0x4
10 bar21 0x00008000 Gb 1 Code Pr 0x14
Section 1 is Code area (bar21 routine is present here)
Section 2 is RW area (x21 variable is present here)
Now, if we see st_shndx (section index -- marked as "Sec" in output above) field in DYNSYM section & SYMTAB section for these two variables is different.
For Example:
x21 in DYNSYM points to section 1 (code area) & in SYMTAB, it points to section 2 (RW area)
Can someone help me understand why? Or guide me to resource where I can get more information on this.
Regards,
Raju Udava

LLVM (arm-none-eabi target) is producing an ARM.exidx section for C based code(?)

Compiling a simple HelloWorld.c using Clang/LLVM (arm-none-eabi target) produces a relocation section '.rel.ARM.exidx' but using arm-gcc does not. These LLVM produced unwind table entries are correctly tagged as canunwind. But why are they even produced at all as they are not needed and just cause bloat as you get an entry for every C function in your AXF?
readelf edxidx from HelloWorld.o
Relocation section '.rel.ARM.exidx' at offset 0x580 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000b2a R_ARM_PREL31 00000000 .text
00000008 00000b2a R_ARM_PREL31 00000000 .text
Unwind table index '.ARM.exidx' at offset 0xcc contains 2 entries:
0x0 <print_uart0>: 0x1 [cantunwind]
0x54 <c_entry>: 0x1 [cantunwind]
In testing Clang defaults: If I pass "-funwind-tables" to Clang to force unwinding for even C functions, I get what I would expect had I been writing .cpp functions and "-fno-unwind-tables" results in the same as above.
Relocation section '.rel.ARM.exidx' at offset 0x5a4 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000b2a R_ARM_PREL31 00000000 .text
00000000 00001600 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
00000008 00000b2a R_ARM_PREL31 00000000 .text
00000008 00001600 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
Unwind table index '.ARM.exidx' at offset 0xcc contains 2 entries:
0x0 <print_uart0>: 0x8001b0b0
Compact model index: 0
0x01 vsp = vsp + 8
0xb0 finish
0xb0 finish
0x54 <c_entry>: 0x809b8480
Compact model index: 0
0x9b vsp = r11
0x84 0x80 pop {r11, r14}
1) Is there anyway to turn off the .ARM.exidx section when only using C functions as they will always be flagged as "cantunwind".
2) Anyway to strip this section during linking? (gc-section will not workof course since these table entries reference in-use functions)
3) Why does arm-gcc not create this section (well, it does if you are using new lib, nano, etc... but I use and link no std libs)
I'll answer (2), since that's what I did. Add to your linker script:
/DISCARD/ :
{
*(.ARM.exidx)
}

linux kernel module dies after 100000 interrupts

I'm working on a kernel module for the 2.6.39 kernel. (I know this is out of date, but it's what came with my evaluation board and I wanted to get this working before moving to the 3.x series.)
My module is very simple at the moment. It listens for a 200us pulse on a GPIO pin, then increments a counter which resets it every 25089 iterations. (25089 is the size of a buffer to be used later.) Strangely enough though, my module dies after exactly 100000 interrupts every time I use it, and I'm really at a loss. I looked at changing the kernel jiffy frequency, but that seems like it's unrelated. I also tried using a tickless kernel, and that doesn't seem to have an effect either. I can't find much of anything on google about this problem either. Has anyone else seen this issue?
I'm building for an Atmel AT91 processor if that is important. I'll list my crash message below.
root#at91:~# irq 56: nobody cared (try booting with the "irqpoll" option)
[<c0036804>] (unwind_backtrace+0x0/0xec) from [<c006eca4>] (__report_bad_irq+0x34/0xa0)
[<c006eca4>] (__report_bad_irq+0x34/0xa0) from [<c006eed0>] (note_interrupt+0x1c0/0x22c)
[<c006eed0>] (note_interrupt+0x1c0/0x22c) from [<c006d904>] (handle_irq_event_percpu+0x168/0x19c)
[<c006d904>] (handle_irq_event_percpu+0x168/0x19c) from [<c006d960>] (handle_irq_event+0x28/0x38)
[<c006d960>] (handle_irq_event+0x28/0x38) from [<c003ace0>] (gpio_irq_handler+0x74/0x98)
[<c003ace0>] (gpio_irq_handler+0x74/0x98) from [<c002b078>] (asm_do_IRQ+0x78/0xac)
[<c002b078>] (asm_do_IRQ+0x78/0xac) from [<c00313d4>] (__irq_svc+0x34/0x60)
Exception stack(0xc04b1f70 to 0xc04b1fb8)
1f60: 00000000 0005317f 0005217f 60000013
1f80: c04b0000 c04b61cc c04b5ffc c04e1224 20000000 41069265 20025cbc 00000000
1fa0: 600000d3 c04b1fb8 c0032cc8 c0032cd4 60000013 ffffffff
[<c00313d4>] (__irq_svc+0x34/0x60) from [<c0032cd4>] (default_idle+0x38/0x40)
[<c0032cd4>] (default_idle+0x38/0x40) from [<c0032af8>] (cpu_idle+0x70/0xc8)
[<c0032af8>] (cpu_idle+0x70/0xc8) from [<c00089c0>] (start_kernel+0x284/0x2e4)
[<c00089c0>] (start_kernel+0x284/0x2e4) from [<20008038>] (0x20008038)
handlers:
[<c01eb660>] (grab_spi_data+0x0/0x6c)
Disabling IRQ #56
root#at91:~# cat /proc/interrupts
CPU0
1: 1387 AIC at91_tick, at91_rtc, ttyS0
12: 39 AIC atmel_mci.0
13: 0 AIC atmel_spi.0
14: 0 AIC atmel_spi.1
17: 1804 AIC tc_clkevt
20: 7520 AIC at_hdmac
21: 0 AIC at_hdmac
22: 1 AIC ehci_hcd:usb1, ohci_hcd:usb2
23: 0 AIC atmel_usba_udc
24: 136 AIC eth0
26: 0 AIC atmel_mci.1
56: 100000 GPIO quicklogic_ready
80: 0 GPIO atmel_usba_udc
142: 0 GPIO mmc-detect
143: 1 GPIO mmc-detect
Err: 0
My interrupt handler is called grab_spi_data, which you can see is that the bottom of the backtrace, and I'm watching IRQ 56. I am really stumped.
Looks like you're not handling the IRQs.
Linux gets upset after 100,000 times - See the comment above __report_bad_irq and find this magic number.
Your interrupt handler probably never returns IRQ_HANDLED, which is what it should do after handling the interrupt.

multiplication using SSE (x*x*x)

I'm trying to optimize a cube function using SSE
long cube(long n)
{
return n*n*n;
}
I have tried this :
return (long) _mm_mul_su32(_mm_mul_su32((__m64)n,(__m64)n),(__m64)n);
And the performance was even worse (and yes I have never done anything with sse).
Is there a SSE function which could increase the performance?
Or something else?
output from cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 3070 # 2.66GHz
stepping : 6
cpu MHz : 2660.074
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips : 5320.14
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 3070 # 2.66GHz
stepping : 6
cpu MHz : 2660.074
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips : 5320.35
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
I think you have misunderstood when it is useful to use SSE. But I have only used SSE with floating-point types so my experience may not be applicable to this case. I hope you can still learn some bits from what I have written.
SSE provides SIMD, Single Instruction Multiple Data. It is useful when you have many values on which you want to perform the same calculation. It is a kind of small scale parallelization. So instead of doing one multiplication, you can do four at the same time. But it is only useful if you have all dependencies available.
So in your case, there is no room for parallelization. You could write a function that calculated the cube of four floats that would be faster than calling a function that calculated the cube of one number four times.
Your code compiles to:
cube:
movl 4(%esp), %edx
movl %edx, %eax
imull %edx, %eax
imull %edx, %eax
ret
If inlined the ret and moves will get optimized out, so you have two imul instructions. I doubt mmx or SSE could make this any faster (transfering the data into the mmx / sse registers alone would probably be slower than the two imuls)
You have to align your variables on 16 bytes, for one. Also, in my own experience tinkerin with SSE, you will get significant gains if you compute your function on a whole batch of values... say
cube(long* inArray, long* outArray, size_t size) {
...
}

Resources