LDR - Literal pool - ARM - c

I know how to load an immediate value using the LDR instruction in ARM.
For example:
LDR R0,=0x0804c088
This instruction loads the value (0x0804c088) to the register r0. When I try to access the address it is stored in using x/x $r0 using gdb. I get the message: Cannot access memory at address0x0804c088. But that is not the address, it is the value stored in that register and the address is a PC relative address which is stored in the literal pool.
What is the mistake that I doing there? did I understand something wrong there?
Moreover, How should I set the literal pool, can you give me an example please?
#Carl Norum: Here is the code.
__asm__("LDR R0,=0x0804c088");
__asm__("LDR R1,[PC, #34];");
O/p from gdb
(gdb) info registers
r0 0x804c088 134529160
r1 0xf2c00300 4072669952
r2 0x0 0
r3 0x1 1
r4 0x8961 35169
r5 0x0 0
r6 0x0 0
r7 0xbe8f4b74 3197062004
r8 0x0 0
r9 0xef99 61337
r10 0xf00d 61453
r11 0x0 0
r12 0x0 0
sp 0xbe8f4b74 0xbe8f4b74
lr 0x89a7 35239
pc 0x8a62 0x8a62 <test46+34>
cpsr 0x60000030 1610612784
(gdb) x/x $r0
0x804c088: Cannot access memory at address 0x804c088
(gdb) p/x$r0
$1 = 0x804c088
(gdb) p/x $r1
$2 = 0xf2c00300
(gdb) x/x $r1
0xf2c00300: Cannot access memory at address 0xf2c00300
(gdb) x/x $r15
0x8a62 <test46+34>: 0x1022f8df

The gdb x command has an inherent dereferencing operation. If you want to print the value in r0, just use p:
p/x $r0
The form of LDR you're using isn't a real instruction - it's an assembler macro-instruction that gets converted into a pc-relative ldr instruction and a literal value someplace in memory (probably close to the location you're using it). If you want to find the address of the constant in the literal pool, you need to look at the output binary. Your source assembly code doesn't contain it.
For example, let's take this simple example program:
.globl f
f:
ldr r0,=0x12345678
And then build and disassemble it:
$ arm-none-eabi-clang -c example.s
$ arm-none-eabi-objdump -d example.o
example.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <f>:
0: e51f0004 ldr r0, [pc, #-4] ; 4 <f+0x4>
4: 12345678 .word 0x12345678
You can see the literal is right there at offset 4.
You don't need to do anything to "set up the literal pool". Any necessary literals will be set up for you by the assembler.

If you want to know the actual address of the literal pool at runtime, try this :
adr r12, literal_pool_label
.
.
. // your code here
.
.
.
literal_pool_label:
.ltorg
Then you can read r12 which contains the address of the literal pool at runtime.
ltorg is a directive forcing where the literal pool is placed. For short codes, they are automatically attached at the end of the code, but if the code gets larger than 4KB, the LDR pseudo instruction will cause an error at assembly time since the pc-relative offset gets bigger than 4096, and thus out of allowed range.
To avoid this, you can put ltorg middle in the code where it's safe from being misinterpreted as an instruction. (after an absolute branch for example)

Related

Process sections: does a declaration add also something to .text? If yes, what does it add?

I have a C code like this one, that will be possibly compiled in an ELF file for ARM:
int a;
int b=1;
int foo(int x) {
int c=2;
static float d=1.5;
// .......
}
I know that all the executable code goes into the .text section, while .data , .bss and .rodata will contain the various variables/constants.
My question is: does a line like int b=1; here add also something to the .text section, or does it only tell the compiler to place a new variable initialized to 1 in .data (then probably mapped in RAM memory when deployed on the final hardware)?
Moreover, trying to decompile a similar code, I noticed that a line such as int c=2;, inside the function foo(), was adding something to the stack, but also some lines of .text where the value '2' was actually memorized there.
So, in general, does a declaration always imply also something added to .text at an assembly level? If yes, does it depends on the context (i.e. if the variable is inside a function, if it is a local global variable, ...) and what is actually added?
Thanks a lot in advance.
does a line like int b=1; here add also something to the .text section, or does it only tell the compiler to place a new variable initialized to 1 in .data (then probably mapped in RAM memory when deployed on the final hardware)?
You understand that this is likely to be implementation specific, but the likelihood is that that you will just get initialised data in the data section. Were it a constant, it might, instead go into the text section.
Moreover, trying to decompile a similar code, I noticed that a line such as int c=2;, inside the function foo(), was adding something to the stack, but also some lines of .text where the value '2' was actually memorized there.
Automatic variables that are initialised, have to be initialised each time the function's scope is entered. The space for c is reserved on the stack (or in a register, depending on the ABI) but the program has to remember the constant from which it is initialised and this is best placed somewhere in the text segment, either as a constant value or as a "move immediate" instruction.
So, in general, does a declaration always imply also something added to .text at an assembly level?
No. If a static variable is initialised to zero or null or not initialised at all, it is often just enough to reserve space in bss. If a static non constant variable is initialised to a non zero value, it will just be put in the data segment.
As #goodvibration correctly stated, only global or static variables go to the segments. This is because their lifetime is the whole execution time of the program.
Local variables have a different lifetime. They exist only during the execution of the block (e.g. function) they are defined within. If a function is called, all parameters that does not fit into registers a pushed to the stack and the return address is written to the link register.* The function saves possibly the link register and other registers at the stack and adds some space at the stack for local variables (this is the code you have observed). At the end of the function, the saved registers are poped and the the stackpointer is readjusted. In this way, you get an automatic garbage collection for local variables.
*: Please note, that this is true for (some calling conventions of) ARM only. It's different e.g. for Intel processors.
this is one of those just try it things.
int a;
int b=1;
int foo(int x) {
int c=2;
static float d=1.5;
int e;
e=x+2;
return(e);
}
first thing without optimization.
arm-none-eabi-gcc -c so.c -o so.o
arm-none-eabi-objdump -D so.o
arm-none-eabi-ld -Ttext=0x1000 -Tdata=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf > so.list
do worry about the warning, needed to link to see that everything found a home
Disassembly of section .text:
00001000 <foo>:
1000: e52db004 push {r11} ; (str r11, [sp, #-4]!)
1004: e28db000 add r11, sp, #0
1008: e24dd014 sub sp, sp, #20
100c: e50b0010 str r0, [r11, #-16]
1010: e3a03002 mov r3, #2
1014: e50b3008 str r3, [r11, #-8]
1018: e51b3010 ldr r3, [r11, #-16]
101c: e2833002 add r3, r3, #2
1020: e50b300c str r3, [r11, #-12]
1024: e51b300c ldr r3, [r11, #-12]
1028: e1a00003 mov r0, r3
102c: e28bd000 add sp, r11, #0
1030: e49db004 pop {r11} ; (ldr r11, [sp], #4)
1034: e12fff1e bx lr
Disassembly of section .data:
00002000 <b>:
2000: 00000001 andeq r0, r0, r1
00002004 <d.4102>:
2004: 3fc00000 svccc 0x00c00000
Disassembly of section .bss:
00002008 <a>:
2008: 00000000 andeq r0, r0, r0
as a disassembly it tries to disassemble data so ignore that (the andeq next to 0x2008 for example).
The a variable is global and uninitialized so it lands in .bss (typically...a compiler can choose to do whatever it wants so long as it implements the language correctly, doesnt have to have something called .bss for example, but gnu and many others do).
b is global and initialized so it lands in .data, had it been declared as const it might land in .rodata depending on the compiler and what it offers.
c is a local non-static variable that is initialized, because C offers recursion this needs to be on the stack (or managed with registers or other volatile resources), and initialized each run. We needed to compile without optimization to see this
1010: e3a03002 mov r3, #2
1014: e50b3008 str r3, [r11, #-8]
d is what I call a local global, it is a static local so it lives outside the function, not on the stack, alongside the globals but with local access only.
I added e to your example, this is a local not initialized, but then used. Had I not used it and not optimized there probably would have been space allocated for it but no initialization.
save x on the stack (per this calling convention x enters in r0)
100c: e50b0010 str r0, [r11, #-16]
then load x from the stack, add two, save as e on the stack. read e from
the stack and place in the return location for this calling convention which is r0.
1018: e51b3010 ldr r3, [r11, #-16]
101c: e2833002 add r3, r3, #2
1020: e50b300c str r3, [r11, #-12]
1024: e51b300c ldr r3, [r11, #-12]
1028: e1a00003 mov r0, r3
For all architectures, unoptimized this is somewhat typical, always read variables from the stack and put them back quickly. Other architectures have different calling conventions with respect to where the incoming parameters and outgoing return value live.
If I optmize (-O2 on the gcc line)
Disassembly of section .text:
00001000 <foo>:
1000: e2800002 add r0, r0, #2
1004: e12fff1e bx lr
Disassembly of section .data:
00002000 <b>:
2000: 00000001 andeq r0, r0, r1
Disassembly of section .bss:
00002004 <a>:
2004: 00000000 andeq r0, r0, r0
b is a global, so at the object level a global space has to be reserved for it, it is .data, optimization doesnt change that.
a is also global and still .bss, because at the object level it was declared such so allocated in case another object needs it. The linker doesnt remove these.
Now c and d are dead code they dont do anything they need no storage so
c is no longer allocated space on the stack nor is d allocated any .data
space.
We have plenty of registers for this architecture for this calling convention for this code, so e does not need any memory allocated on the
stack, it comes in in r0 the math can be done with r0 and then it is returned in r0.
I know I didnt tell the linker where to put .bss by telling it .data it put .bss in the same space without complaint. I could have put -Tbss=0x3000 for example to give it its own space or just done a linker script. Linker scripts can play havoc with the typical results, so beware.
Typical, but there might be a compiler with exceptions:
non-constant globals go in .data or .bss depending on whether they are initialized during the declaration or not.
If const then perhaps .rodata or .text depending (or .data or .bss would technically work)
non-static locals go in general purpose registers or on the stack as needed (if not completely optimized away).
static locals (if not optimized away) live with globals but are not globally accessible they just get allocated space in .data or .bss like the globals do.
parameters are governed completely by the calling convention used by that compiler for that target. Just because arm or mips or other may have written down a convention doesnt mean a compiler has to use it, only if they claim to support some convention or standard should they then attempt to comply. For a compiler to be useful it needs a convention and stick to it whatever it is, so that both caller and callee of a function know where to get parameters and to return a value. Architectures with enough registers will often have a convention where some few number of registers are used for the first so many parameters (not necessarily one to one) and then the stack is used for all other parameters. likewise a register may be used if possible for a return value. Some architectures due to lack of gprs or other, use the stack in both directions. or the stack in one and a register in the other. You are welcome to seek out the conventions and try to read them, but at the end of the day the compiler you are using, if not broken follows a convention and by setting up experiments like the one above you can see the convention in action.
Plus in this case optimizations.
void more_fun ( unsigned long long );
unsigned fun ( unsigned int x, unsigned long long y )
{
more_fun(y);
return(x+1);
}
If I told you that arm conventions typically use r0-r3 for the first few parameters you might assume that x is in r0 and r1 and r2 are used for y and we could have another small parameter before needing the stack, well
perhaps older arm, but now it wants the 64 bit variable to use an even then an odd.
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e1a04000 mov r4, r0
8: e1a01003 mov r1, r3
c: e1a00002 mov r0, r2
10: ebfffffe bl 0 <more_fun>
14: e2840001 add r0, r4, #1
18: e8bd4010 pop {r4, lr}
1c: e12fff1e bx lr
so r0 contains x, r2/r3 contain y and r1 was passed over.
the test was crafted to not have y as dead code and to pass it to another function we can see where y was stored on the way into fun and way out to more_fun. r2/r3 on the way in, needs to be in r0/r1 to call more fun.
we need to preserve x for the return from fun. one might expect that x would land on the stack, which unoptimized it would, but instead save a register that the convention has stated will be preserved by functions (r4) and use r4 throughout the function or at least in this function to store x. A performance optimization, if x needed to be touched more than once memory cycles going to the stack cost more than register accesses.
then it computes the return and cleans up the stack, registers.
IMO it is important to see this, the calling convention comes into play for some variables and others can vary based on optimization, no optimization they are what most folks are going to state off hand, .bss, .data (.text/.rodata), with optimization then it depends if if the variable survives at all.

LDMIA instruction results in corrupt register data

I'm attempting to run a compiled program on a ARM Cortex-M3 bare metal. Before the system even reaches the application code, an odd error blows the program counter away and errors out.
Before the instruction, the registers are observed to be:
r0 0x0 0
r1 0x1 1
r2 0x0 0
r3 0x2 2
r4 0x18564 99684
r5 0x18418 99352
r6 0x0 0
r7 0x0 0
r8 0x8311 33553
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0xc84404 13124612
sp 0x7ffe0 0x7ffe0
lr 0x80df 32991
pc 0x8380 0x8380
The following instruction is executed nominally:
0x829c <__call_exitprocs+112>: ldmia.w sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc}
And the registers being read explode. It also sends the program counter way off effectively terminating the program.
...
r3 0x2 2
r4 0xffffffff 4294967295
r5 0xffffffff 4294967295
r6 0xffffffff 4294967295
r7 0xffffffff 4294967295
r8 0xffffffff 4294967295
r9 0xffffffff 4294967295
r10 0xffffffff 4294967295
r11 0x0 0
...
pc 0xfffffffe 0xfffffffe
I've read a similar issue on stack overfflow, but it doesn't seem to be the direct issue that I'm facing here. The ATMEL documentation for this board doesn't specify a limitation on number of internal registers read at once on a quick glance.
Any thoughts on the problem and, if possible, a workaround in gcc to prevent it?
The instruction (and its effect) are absolutely correct. But the sp value before this instruction is absolutely wrong. Your chip has no RAM memory on that address. In fact - it probably has no memory at all at this address. See page 32 of the manual (with the memory map).
http://www.atmel.com/Images/Atmel-6430-32-bit-Cortex-M3-Microcontroller-SAM3U4-SAM3U2-SAM3U1_Datasheet.pdf
Your sp should be somewhere within SRAM, so above 0x20000000. The value you have - 0x7ffe0 is somewhere in the "Boot memory" region. If you want to find the problem, find out why sp has invalid value.

How the dynamic linker determines which routine to call on Linux?

I have a question about dynamic linking on Linux. Consider the following disassembly of an ARM binary.
8300 <printf#plt-0x40>:
....
8320: e28fc600 add ip, pc, #0, 12
8324: e28cca08 add ip, ip, #8, 20 ; 0x8000
8328: e5bcf344 ldr pc, [ip, #836]! ; 0x344
....
83fc <main>:
...
8424:ebffffbd bl 8320 <_init+0x2c>
Main function calls printf at 8424: bl 8320. 8320 is an address in the .plt shown above. Now the code in .plt makes call to dynamic linker to invoke printf routine. My question is how the dynamic linker will be able to say that it is a call to printf?
TLDR; The PLT calls the dynamic linker by passing:
the address of the GOT entry in IP (&PLTGOT[n+3]);
&PLTGOT[2] is in LR;
Moreover PLTGOT[1] identifies the shared-object/executable.
The dynamic linker use this to find the relocation entry (plt_relocation_table[n]) and thus the symbol (printf).
Explanation of the PLT entry code
This is explained (somehow) in section A.3 of ELF for ARM:
8320: e28fc600 add ip, pc, #0, 12
8324: e28cca08 add ip, ip, #8, 20 ; 0x8000
8328: e5bcf344 ldr pc, [ip, #836]! ; 0x344
Which are explained by:
ADD ip, pc, #-8:PC_OFFSET_27_20:__PLTGOT(X)
; R_ARM_ALU_PC_G0_NC(__PLTGOT(X))
ADD ip, ip, #-4:PC_OFFSET_19_12: __PLTGOT(X)
;R_ARM_ALU_PC_G1_NC(__PLTGOT(X))
LDR pc, [ip, #0:PC_OFFSET_11_0:__PLTGOT(X)]!
; R_ARM_LDR_PC_G2(__PLTGOT(X))
Those instructions do two things:
they compute the address of the GOT entry as an offset from PC and store it in the IP register;
they jump to this GOT entry.
The spec notes that:
The write-back on the final LDR ensures that ip contains
the address of the PLTGOT entry. This is critical to
incremental dynamic linking.
The "write-back" is the use of "!" in the last instruction: this is used to update IP register with the final offset (#836). This way IP contains the addess of the GOT entry at the end of the PLT entry.
The dynamic linker has the address of the GOT entry in IP:
it can find the shared-object or executable;
it can find the correct relocation entry.
This relocation entry references the symbol of target function (printf in your case):
Offset Info Type Sym. Value Sym. Name
0001066c 00000116 R_ARM_JUMP_SLOT 00000000 printf
The Base Platform ABI for the ARM architecture notes that:
When the platform supports lazy function binding (as ARM Linux does)
this ABI requires ip to address the corresponding
PLTGOT entry at the point where the PLT calls through it.
(The PLT is requir ed to behave as if it ended with LDR pc, [ip]).
Finding the relocation entry from the GOT
Now the way the relocation entry is found from the GOT address is not clear. Binary search could be used but is would not be convenient. The GNU ld.so does it like this (glibc/sysdeps/arm/dl-trampoline.S):
dl_runtime_resolve:
cfi_adjust_cfa_offset (4)
cfi_rel_offset (lr, 0)
# we get called with
# stack[0] contains the return address from this call
# ip contains &GOT[n+3] (pointer to function)
# lr points to &GOT[2]
# Save arguments. We save r4 to realign the stack.
push {r0-r4}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
cfi_rel_offset (r2, 8)
cfi_rel_offset (r3, 12)
# get pointer to linker struct
ldr r0, [lr, #-4]
# prepare to call _dl_fixup()
# change &GOT[n+3] into 8*n NOTE: reloc are 8 bytes each
sub r1, ip, lr
sub r1, r1, #4
add r1, r1, r1
[...]
The address of the second GOT entry is in LR. I guess this is donebyt .PLT0:
00015b84 :
15b84: e52de004 push {lr} ; (str lr, [sp, #-4]!)
15b88: e59fe004 ldr lr, [pc, #4] ; 15b94
15b8c: e08fe00e add lr, pc, lr
15b90: e5bef008 ldr pc, [lr, #8]!
15b94: 0012f46c andseq pc, r2, ip, ror #8
From those two GOT addresses, the dynamic linker can find the GOT offset and the offset in the PLT relocation table.
From &GOT[2], the dynamic linker can find the second entry of the PLTGOT (GOT[1]) which contains the address of the linker struct (a reference used by the dynamic linker to recosgnise this shared-object/executable).
I don't where this is specified: it does not seem to be part of the base ARM ABI spec.
.rela.plt contains the address of printf to inform the dynamic linker from where to locate the printf
check this link for details very soft to digest https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html. This article also clarify about process of variables to be accessed through Shared libraries first and then functions.
The process of dynamic linking is described in great detail here.
TL;DR: at static link time, ld creates a set of tables in special sections such as .rel.dyn, .rel.plt, etc., which tell the runtime loader what to do at runtime.
You can examine these tables with nm -D, readelf -Wr, objdump -R, etc.

Debugging Hard Fault on ARM Cortex-M0+ (using CMSIS DSP library)

I'm using the CMSIS DSP library on a Cortex-M0+.
Some functions, such as sqrt and FFT, are resulting in hard faults.
The arm_sqrt_f32 function calls sqrtf:
arm_sqrt_f32(
float32_t in,
float32_t * pOut)
[...]
*pOut = sqrtf(in);
part of the generated code:
0x00003914: bl 0x49e8 <sqrtf>
0x00003918: adds r2, r0, #0
0x0000391a: ldr r3, [r7, #0]
0x0000391c: str r2, [r3, #0]
The hard fault happens on the str instruction at address 0x0000391c. When at this line, the registers are:
$r1 0x0
$r2 0x40000000
$r3 0x0
$r4 0x0
$r5 0x200017fc
$r6 0x0
$r7 0x200017e0
$r8 0xfff7ffff
$r9 0xefbffffe
$r10 0xff7fffff
$r11 0x0
$r12 0x0
the SP register is 0x200017e0, an address containing 0.
I can't figure out why I'm getting this hard fault. What should I do?
Thanks!
Lets look at exactly what your str call is doing by looking at this page
your str call is doing str r2,[r3, #0] which translates to (if i'm not mistaken) :
store r2 in the address r3 offset by #0
Looking at those register values, you are trying to put 0x40000000 into location 0x0 offset by 0, so 0x0 still. It is the equivalent of a segmentation fault, you are trying to access memory that is not avaliable to you thus causing the hard fault.
Seeing as how that code is generated, I'm assuming you are giving it a faulty pOut pointer.
Make sure you aren't calling the function by doing arm_sqrt_f32(float32_t foo, float32_t* pOut) , you'll want to call it by doing arm_sqrt_f32(float32_t foo, float32_t &pOut) where pOut may be delcared as float32_t pOut = bar; since, as a pointer arguement, its looking for an address
If the Cortex-M0 fault mechanism is the same as the Cortex-M3/4/7 fault mechanism, then the following page provides detailed information on how to decode the fault stack, giving you the address of the faulting instruction, as well as the register values at the time.
http://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html

ARM - Load and Store assembly instructions

I am trying to load and store data from two different arm registers.
int testing[64*1024] __attribute__ ((aligned (8192)));
__asm__("MOV r0, %0" :: "r" (testing) : "r0");
__asm__("STR R5,[R0];");
In my initial attempt I tried to store some data pointed to by the register r0 to register r5. There are absolutely no compilation problems but the data in the register cannot be accessed.
It is the same case for Load as well.
LDR R1,[R0]
(gdb) info registers
r0 0xb6000 745472
r1 0x1 1
r2 0x0 0
r3 0xb6000 745472
r4 0x8961 35169
r5 0x0 0
r6 0x0 0
r7 0xbeba9664 3199899236
r8 0x0 0
r9 0xefb9 61369
r10 0xf02d 61485
r11 0x0 0
r12 0x0 0
sp 0xbeba9664 0xbeba9664
lr 0x89cb 35275
pc 0xeace 0xeace <test48+14>
cpsr 0x60000030 1610612784
(gdb) bt
#0 0x0000eace in test48 ()
#1 0x000089ca in main ()
(gdb) x/x $r5
0x0: Cannot access memory at address 0x0
(gdb) x/x $r0
0xb6000 <testing>: 0x00000000
Essentially I am trying to achieve some memory inline addressing using ldr and str.
I took help of this guide while I was building my example
Any idea where I am going wrong
Your comment and your code do not match:
In my initial attempt I tried to store some data pointed to by the register r0 to register r5 [...]
__asm__("STR R5,[R0];");
The instruction you wrote stores the value of R5 into the memory location that R0 points to. The register R5 does not point to any memory location - its value is 0x00 in your example.
The __asm__ statements do not declare the R5 register used in any way, so the compiler is free to put any temporary value or variable in it. This also explains:
(gdb) x/x $r5
0x0: Cannot access memory at address 0x0
Your gdb command tries to access the memory location that R5 points to - but it does not point at any.

Resources