ARM Disassembly - confused about "LDR r7, [pc, #0x14]" - arm

I am attempting to learn ARM assembly. I decided to disassembly the "read" function and this is what I get. From the looks of it, it seems to be making a system call (svc #0) using the R7 register as the system call number.
mov ip, r7 # save R7
ldr r7, [pc, #0x14] # get system call number and put it into R7 ??
svc #0 # make system call
mov r7, ip # restore R7
cmn r0, #0x1000
bxls lr
rsb r0, r0, #0 # R0 = 0
b #2976848216
I am a bit confused though on why it is loading the system call number the way it is ("LDR r7, [PC, #0x14]"). Isn't this just doing in C code r7 = *(pc + 0x14)? I looked at other functions that might also use system calls (e.g. kill, wait, etc.) and they use a very similar convention (i.e. LDR R7, [PC, #0x14]).
This is on Android if it helps at all.
Thanks!

mov ip, r7 ## save R7
ldr r7, [pc, #0x14] ## get system call number and put it into R7 ??
svc #0 ## make system call
mov r7, ip ## restore R7
cmn r0, #0x1000 #
bxls lr #
rsb r0, r0, #0 ## R0 = 0
.word 0x1234
.word 0xABCD
you pretty much left out the most important parts so had to improvise
00000000 <.text>:
0: e1a0c007 mov ip, r7
4: e59f7014 ldr r7, [pc, #20] ; 20 <.text+0x20>
8: ef000000 svc 0x00000000
c: e1a0700c mov r7, ip
10: e3700a01 cmn r0, #4096 ; 0x1000
14: 912fff1e bxls lr
18: e2600000 rsb r0, r0, #0
1c: 00001234 andeq r1, r0, r4, lsr r2
20: 0000abcd andeq sl, r0, sp, asr #23
And yes it is doing what you say it is doing, it is loading some value in r7 before making the system call, now what value is it as to why it is using a pc relative load (likely a constant that wont fit as an immediate, and/or a link time resolved value rather than compile time) and are there different values for different system calls and is r7 a parameter or not? Well you didnt provide enough information to talk about that. Once you have/see that information then that should be pretty obvious what those answers are...if any of those are is your question.

Related

newlib init_array contains only 0xffffffff with RTEMS

I'm trying to port RTEMS on the SAME54P20A. I managed to make a basic BSP that compile and a basic application.
I compile the application using
./waf configure --rtems=$HOME/rtems/5 --rtems-bsp=arm/same54p20a
./waf
But when running the application on the target, it seems that it never gets to the application task.
I identified the issue by stepping through the assembly code. A call to no existing function is made during __libc_init_array.
Here is the assembly for the function :
00006ae4 <__libc_init_array>:
6ae4: b570 push {r4, r5, r6, lr}
6ae6: 4e0d ldr r6, [pc, #52] ; (6b1c <__libc_init_array+0x38>)
6ae8: 4d0d ldr r5, [pc, #52] ; (6b20 <__libc_init_array+0x3c>)
6aea: 1b76 subs r6, r6, r5
6aec: 10b6 asrs r6, r6, #2
6aee: d006 beq.n 6afe <__libc_init_array+0x1a>
6af0: 2400 movs r4, #0
6af2: 3401 adds r4, #1
6af4: f855 3b04 ldr.w r3, [r5], #4
6af8: 4798 blx r3
6afa: 42a6 cmp r6, r4
6afc: d1f9 bne.n 6af2 <__libc_init_array+0xe>
6afe: 4e09 ldr r6, [pc, #36] ; (6b24 <__libc_init_array+0x40>)
6b00: 4d09 ldr r5, [pc, #36] ; (6b28 <__libc_init_array+0x44>)
6b02: 1b76 subs r6, r6, r5
6b04: f000 fe46 bl 7794 <_init>
6b08: 10b6 asrs r6, r6, #2
6b0a: d006 beq.n 6b1a <__libc_init_array+0x36>
6b0c: 2400 movs r4, #0
6b0e: 3401 adds r4, #1
6b10: f855 3b04 ldr.w r3, [r5], #4
6b14: 4798 blx r3
6b16: 42a6 cmp r6, r4
6b18: d1f9 bne.n 6b0e <__libc_init_array+0x2a>
6b1a: bd70 pop {r4, r5, r6, pc}
6b1c: 00008914 andeq r8, r0, r4, lsl r9
6b20: 00008914 andeq r8, r0, r4, lsl r9
6b24: 00008918 andeq r8, r0, r8, lsl r9
6b28: 00008914 andeq r8, r0, r4, lsl r9
When reaching address 0x6b14 the register r3 contains 0xffffffff which eventually trigger a reset of the board.
Here are the registers states :
r0 0x200016c8 536876744 r1 0xa010001 167837697 │
│r2 0x0 0 r3 0xffffffff -1 │
│r4 0x1 1 r5 0x8918 35096 │
│r6 0x1 1 r7 0x0 0 │
│r8 0x0 0 r9 0x0 0 │
│r10 0x0 0 r11 0x0 0 │
│r12 0x20000400 536871936 sp 0x20005888 0x20005888 │
│lr 0x6b09 27401 pc 0x6b14 0x6b14 <__libc_init_array+48> │
│xPSR 0x1000000 16777216 fpscr 0x0 0 │
│msp 0x20004888 0x20004888 <_ISR_Stack_area_begin+4072> psp 0x20005888 0x20005888 │
│primask 0x0 0 basepri 0x0 0 │
│faultmask 0x0 0 control 0x2 2
I can't figure out the reason of this.
Here is the application code :
/*
* app.c
*/
#include <rtems.h>
#include <stdlib.h>
#include "bsp/same54p20a.h"
#include "bsp/port.h"
rtems_task Init(
rtems_task_argument ignored
)
{
/* Set pin PC21 as output to control LED1 */
PORT_REGS->GROUP[2].PORT_DIR |= PORT_PC21;
/* Turn on the led */
PORT_REGS->GROUP[2].PORT_OUTCLR |= PORT_PC21;
exit( 0 );
}
/*
* init.c
*/
#define CONFIGURE_APPLICATION_NEEDS_ZERO_DRIVER
#define CONFIGURE_APPLICATION_DOES_NOT_NEED_CLOCK_DRIVER
#define CONFIGURE_MAXIMUM_TASKS 1
#define CONFIGURE_RTEMS_INIT_TASKS_TABLE
#define CONFIGURE_INIT
#include <rtems/confdefs.h>
EDIT (After Tom V comment)
Register r3 is loaded by the line
6b10: f855 3b04 ldr.w r3, [r5], #4
So it is loaded with the litteral at address 0x8914 which is (from the dump):
00008914 <__frame_dummy_init_array_entry>:
8914: 000003e9 andeq r0, r0, r9, ror #7
If i'm not mistaken, shouldn't r3 take the value 0x3e9 ?
The value of r3 is loaded from the address in the literal at 0x6b28.
The literal is address 0x00008914. What is at address 0x00008914 in your disassembly is the word 0x000003e9. This means the value of r3 should be 0x000003e9, and it should call the function at 0x000003e8 (the difference of 1 is because this is an interworking address, the 1 in the LSB means that the target uses the thumb instruction set).
If your debbugger shows that the value of r3 that gets called is 0xffffffff, then the value in memory does not match your disassembly dump.
If this memory is flash ROM then something is wrong with your static linking process or your link script. A relocation might not have been applied to the image that you programmed, but then somehow does get applied in your dump.
If this memory is RAM, then maybe something overwrote that value. If you are using a runtime linker (unusual on a microcontroller) then maybe it never applied the relocation in the first place.
In addition to Tom V analysis :
The issue was that some unwanted sections were keep while creating the .hex file.
The solution is to add the switch --strip-unneeded to the command objcopy.

Need Help understanding ARM function

I'm still learning ARM and I couldn't understand what this function is supposed to do.
Can you guys help me out explaining how it works?
.text:0006379C EXPORT _nativeD2AB
.text:0006379C _nativeD2AB
.text:0006379C var_28 = -0x28
.text:0006379C
.text:0006379C STMFD SP!, {R4-R11,LR}
.text:000637A0 SUB SP, SP, #0x3A4
.text:000637A4 STMFA SP, {R0-R3}
.text:000637A8 LDR R0, =(_GLOBAL_OFFSET_ - 0x637B8)
.text:000637AC LDR R1, =(__stack_chk - 0x134EAC)
.text:000637B0 ADD R0, PC, R0 ; _GLOBAL_OFFSET_
.text:000637B4 LDR R0, [R1,R0] ; __stack_chk
.text:000637B8 LDR R0, [R0]
.text:000637BC STR R0, [SP,#0x3C8+var_28]
.text:000637C0 MOV R0, #1
.text:000637C4 ADR R1, sub_637D0
.text:000637C8 MUL R0, R1, R0
.text:000637CC MOV PC, R0
.text:000637CC ; End of function _nativeD2AB
.
.got:00134EAC _GLOBAL_OFFSET_TABLE_ DCD 0
.
.got:00134B0C AREA .got, DATA
.got:00134B0C __stack_chk DCD __stack_chkA
.
Found the rest of the function. If I understood some of it correctly, it seems to be scrambling the data, though that may be just a wild guess:
.text:000637D0 sub_637D0
.text:000637D0 MOV R0, #1
.text:000637D4 ADR R1, sub_637E0
.text:000637D8 MUL R0, R1, R0
.text:000637DC MOV PC, R0
.text:000637DC ; End of function sub_637D0
.text:000637E0 sub_637E0
.text:000637E0
.text:000637E0 arg_14 = 0x14
.text:000637E0
.text:000637E0 STR R2, [SP,#arg_14]
.text:000637E4 MOV R0, #1
.text:000637E8 ADR R1, loc_637F4
.text:000637EC MUL R0, R1, R0
.text:000637F0 MOV PC, R0
.text:000637F0 ; End of function sub_637E0
.text:000637F4 loc_637F4
.text:000637F4 STR R2, [SP,#0x28]
.text:000637F8 STR R0, [SP,#0x18]
.text:000637FC MOV R1, #2
.text:00063800 STR R2, [SP,#0x1C]
.text:00063804 STR R0, [SP,#0x20]
.text:00063808 STR R0, [SP,#0x24]
The function has several parts:
Store registers to the stacj and reserve space (Strangely, not restored)
Load to R0 the address of GLOBAL_OFFSET (Once added with PC), to actually access __stack_chk (When added to GLOBAL_OFFSET). This is done in a very strange way.
Load the data at __stack_chk and store it in the stack
Load to R0 the value of sub_637D0, by doing a multiplication by 1. This is the value returned by the function.
So in my opinion, this does not seem to do anything useful...

arm7tdmi assembly explanation + crash debugging

I'm currently investigating a crash that happened compiled with gcc 4.2.1 on arm7tdmi architecture (I could use 4.9.3 on demand). I'm using LPC2387 and I'm getting wdog resets. Instead of wdog resets I'm using wdog interrupts, so when it would reset otherwise, it gets into my handler, which saves state and prints a whole memory dumps (64k only). So basically I know the registers before wdog reset and have a stack showing all of the call history.
On the stack I can see loads of references to the end of the function, and I see many instructions as data in the memory region. Which I think will become the reason for the halt and then the consequent wdog interrupt. Any ideas what might be happening?
I guess reasons can be when dereferencing a function pointer, but my function seems to be quite straight forward. It is touching many hardware registers (interrupt, peripheral enable/disable).
Like this:
2015/05/27 04:45:30: addr: 4000BF2C value:7FE00390 -->this is "svcvc 0x00e00390" according to gcc 4.2.1 and ".word 0x7fe00390" according to 4.9.3.
Also at the end of the function I see this in gcc 4.9.3
191d4: e89d6ff8 ldm sp, {r3, r4, r5, r6, r7, r8, r9, sl, fp, sp, lr}
191d8: e12fff1e bx lr
191dc: 7fe00390 .word 0x7fe00390
191e0: 40000044 .word 0x40000044
191e4: 00064de5 .word 0x00064de5
191e8: 00064dfb .word 0x00064dfb
191ec: 4000107c .word 0x4000107c
191f0: e0028000 .word 0xe0028000
191f4: e01fc000 .word 0xe01fc000
191f8: 40001084 .word 0x40001084
191fc: 4000113c .word 0x4000113c
19200: 3800b010 .word 0x3800b010
19204: 40002a78 .word 0x40002a78
19208: 40002ab4 .word 0x40002ab4
1920c: 40002aa0 .word 0x40002aa0
19210: 40001080 .word 0x40001080
19214: 400001a9 .word 0x400001a9
19218: e002c000 .word 0xe002c000
1921c: 40001134 .word 0x40001134
19220: 00064e0e .word 0x00064e0e
It used to look like this on gcc 4.2.1:
1953c: 7fe00390 svcvc 0x00e00390
19540: 40000044 andmi r0, r0, r4, asr #32
19544: 0006d74c andeq sp, r6, ip, asr #14
19548: 0006d764 andeq sp, r6, r4, ror #14
1954c: 400012d0 ldrmid r1, [r0], -r0
19550: e0028000 and r8, r2, r0
19554: e01fc000 ands ip, pc, r0
19558: 40001390 mulmi r0, r0, r3
1955c: 40001394 mulmi r0, r4, r3
19560: e002c040 and ip, r2, r0, asr #32
19564: 40002e54 andmi r2, r0, r4, asr lr
19568: e002c068 and ip, r2, r8, rrx
1956c: e002c000 and ip, r2, r0
19570: 40002e90 mulmi r0, r0, lr
19574: e002c02c and ip, r2, ip, lsr #32
19578: 3fffc000 svccc 0x00ffc000
1957c: 40002e7c andmi r2, r0, ip, ror lr
19580: 3fffc0a0 svccc 0x00ffc0a0
19584: 400012d4 ldrmid r1, [r0], -r4
19588: 400001a1 andmi r0, r0, r1, lsr #3
1958c: 400012d8 ldrmid r1, [r0], -r8
19590: 0006d778 andeq sp, r6, r8, ror r7
Can someone explain me what is in the end of the function? what are the .word regions? Why would I see pointers to this area on the stack?
Thanks,
Peter
The bytes after the end of the function chunk is usually data.
e.g. if I have void *somePtr = 0xABCDEF12; then you will typically get an LDR instruction that puts the value into a register and, assuming little-endian operation, you'll see the sequence 12 EF CD AB in hex.

ARM-C Inter-working

I am trying out a simple program for ARM-C inter-working. Here is the code:
#include<stdio.h>
#include<stdlib.h>
int Double(int a);
extern int Start(void);
int main(){
int result=0;
printf("in C main\n");
result=Start();
printf("result=%d\n",result);
return 0;
}
int Double(int a)
{
printf("inside double func_argument_value=%d\n",a);
return (a*2);
}
The assembly file goes as-
.syntax unified
.cpu cortex-m3
.thumb
.align
.global Start
.global Double
.thumb_func
Start:
mov r10,lr
mov r0,#42
bl Double
mov lr,r10
mov r2,r0
mov pc,lr
During debugging on LPC1769(embedded artists board), I get an hardfault error on the instruction " result=Start(). " I am trying to do an arm-C internetworking here. the lr value during the execution of the above the statement(result=Start()) is 0x0000029F, where the faulting instruction is,and the pc value is 0x0000029E.
This is how I got the faulting instruction in r1
__asm("mrs r0,MSP\n"
"isb\n"
"ldr r1,[r0,#24]\n");
Can anybody please explain where I am going wrong? Any solution is appreciated.
Thank you in advance.
I am a beginner in cortex-m3 & am using the NXP LPCXpresso IDE powered by Code_Red.
Here is the disassembly of my code.
IntDefaultHandler:
00000269: push {r7}
0000026b: add r7, sp, #0
0000026d: b.n 0x26c <IntDefaultHandler+4>
0000026f: nop
00000271: mov r3, lr
00000273: mov.w r0, #42 ; 0x2a
00000277: bl 0x2c0 <Double>
0000027b: mov lr, r3
0000027d: mov r2, r0
0000027f: mov pc, lr
main:
00000281: push {r7, lr}
00000283: sub sp, #8
00000285: add r7, sp, #0
00000287: mov.w r3, #0
0000028b: str r3, [r7, #4]
0000028d: movw r3, #11212 ; 0x2bcc
00000291: movt r3, #0
00000295: mov r0, r3
00000297: bl 0xd64 <printf>
0000029b: bl 0x270 <Start>
0000029f: mov r3, r0
000002a1: str r3, [r7, #4]
000002a3: movw r3, #11224 ; 0x2bd8
000002a7: movt r3, #0
000002ab: mov r0, r3
000002ad: ldr r1, [r7, #4]
000002af: bl 0xd64 <printf>
000002b3: mov.w r3, #0
000002b7: mov r0, r3
000002b9: add.w r7, r7, #8
000002bd: mov sp, r7
000002bf: pop {r7, pc}
Double:
000002c0: push {r7, lr}
000002c2: sub sp, #8
000002c4: add r7, sp, #0
000002c6: str r0, [r7, #4]
000002c8: movw r3, #11236 ; 0x2be4
000002cc: movt r3, #0
000002d0: mov r0, r3
000002d2: ldr r1, [r7, #4]
000002d4: bl 0xd64 <printf>
000002d8: ldr r3, [r7, #4]
000002da: mov.w r3, r3, lsl #1
000002de: mov r0, r3
000002e0: add.w r7, r7, #8
000002e4: mov sp, r7
000002e6: pop {r7, pc}
As per your advice Dwelch, I have changed the r10 to r3.
I assume you mean interworking not internetworking? The LPC1769 is a cortex-m3 which is thumb/thumb2 only so it doesnt support arm instructions so there is no interworking available for that platform. Nevertheless, playing with the compiler to see what goes on:
Get the compiler to do it for you first, then try it yourself in asm...
start.s
.thumb
.globl _start
_start:
ldr r0,=hello
mov lr,pc
bx r0
hang : b hang
hello.c
extern unsigned int two ( unsigned int );
unsigned int hello ( unsigned int h )
{
return(two(h)+7);
}
two.c
unsigned int two ( unsigned int t )
{
return(t+5);
}
Makefile
hello.list : start.s hello.c two.c
arm-none-eabi-as -mthumb start.s -o start.o
arm-none-eabi-gcc -c -O2 hello.c -o hello.o
arm-none-eabi-gcc -c -O2 -mthumb two.c -o two.o
arm-none-eabi-ld -Ttext=0x1000 start.o hello.o two.o -o hello.elf
arm-none-eabi-objdump -D hello.elf > hello.list
clean :
rm -f *.o
rm -f *.elf
rm -f *.list
produces hello.list
Disassembly of section .text:
00001000 <_start>:
1000: 4801 ldr r0, [pc, #4] ; (1008 <hang+0x2>)
1002: 46fe mov lr, pc
1004: 4700 bx r0
00001006 <hang>:
1006: e7fe b.n 1006 <hang>
1008: 0000100c andeq r1, r0, ip
0000100c <hello>:
100c: e92d4008 push {r3, lr}
1010: eb000004 bl 1028 <__two_from_arm>
1014: e8bd4008 pop {r3, lr}
1018: e2800007 add r0, r0, #7
101c: e12fff1e bx lr
00001020 <two>:
1020: 3005 adds r0, #5
1022: 4770 bx lr
1024: 0000 movs r0, r0
...
00001028 <__two_from_arm>:
1028: e59fc000 ldr ip, [pc] ; 1030 <__two_from_arm+0x8>
102c: e12fff1c bx ip
1030: 00001021 andeq r1, r0, r1, lsr #32
1034: 00000000 andeq r0, r0, r0
hello.o disassembled by itself:
00000000 <hello>:
0: e92d4008 push {r3, lr}
4: ebfffffe bl 0 <two>
8: e8bd4008 pop {r3, lr}
c: e2800007 add r0, r0, #7
10: e12fff1e bx lr
the compiler uses bl assuming/hoping it will be calling arm from arm. but it didnt, so what they did was put a trampoline in there.
0000100c <hello>:
100c: e92d4008 push {r3, lr}
1010: eb000004 bl 1028 <__two_from_arm>
1014: e8bd4008 pop {r3, lr}
1018: e2800007 add r0, r0, #7
101c: e12fff1e bx lr
00001028 <__two_from_arm>:
1028: e59fc000 ldr ip, [pc] ; 1030 <__two_from_arm+0x8>
102c: e12fff1c bx ip
1030: 00001021 andeq r1, r0, r1, lsr #32
1034: 00000000 andeq r0, r0, r0
the bl to __two_from_arm is an arm mode to arm mode branch link. the address of the destination function (two) with the lsbit set, which tells bx to switch to thumb mode, is loaded into the disposable register ip (r12?) then the bx ip happens switching modes. the branch link had setup the return address in lr, which was an arm mode address no doubt (lsbit zero).
00001020 <two>:
1020: 3005 adds r0, #5
1022: 4770 bx lr
1024: 0000 movs r0, r0
the two() function does its thing and returns, note you have to use bx lr not mov pc,lr when interworking. Basically if you are not running an ARMv4 without the T, or an ARMv5 without the T, mov pc,lr is an okay habit. But anything ARMv4T or newer (ARMv5T or newer) use bx lr to return from a function unless you have a special reason not to. (avoid using pop {pc} as well for the same reason unless you really need to save that instruction and are not interworking). Now being on a cortex-m3 which is thumb+thumb2 only, well you cant interwork so you can use mov pc,lr and pop {pc}, but the code is not portable, and it is not a good habit as that habit will bite you when you switch back to arm programming.
So since hello was in arm mode when it used bl which is what set the link register, the bx in two_from_arm does not touch the link register, so when two() returns with a bx lr it is returning to arm mode after the bl __two_from_arm line in the hello() function.
Also note the extra 0x0000 after the thumb function, this was to align the program on a word boundary so that the following arm code was aligned...
to see how the compiler does thumb to arm change two as follows
unsigned int three ( unsigned int );
unsigned int two ( unsigned int t )
{
return(three(t)+5);
}
and put that function in hello.c
extern unsigned int two ( unsigned int );
unsigned int hello ( unsigned int h )
{
return(two(h)+7);
}
unsigned int three ( unsigned int t )
{
return(t+3);
}
and now we get another trampoline
00001028 <two>:
1028: b508 push {r3, lr}
102a: f000 f80b bl 1044 <__three_from_thumb>
102e: 3005 adds r0, #5
1030: bc08 pop {r3}
1032: bc02 pop {r1}
1034: 4708 bx r1
1036: 46c0 nop ; (mov r8, r8)
...
00001044 <__three_from_thumb>:
1044: 4778 bx pc
1046: 46c0 nop ; (mov r8, r8)
1048: eafffff4 b 1020 <three>
104c: 00000000 andeq r0, r0, r0
Now this is a very cool trampoline. the bl to three_from_thumb is in thumb mode and the link register is set to return to the two() function with the lsbit set no doubt to indicate to return to thumb mode.
The trampoline starts with a bx pc, pc is set to two instructions ahead and the pc internally always has the lsbit clear so a bx pc will always take you to arm mode if not already in arm mode, and in either mode two instructions ahead. Two instructions ahead of the bx pc is an arm instruction that branches (not branch link!) to the three function, completing the trampoline.
Notice how I wrote the call to hello() in the first place
_start:
ldr r0,=hello
mov lr,pc
bx r0
hang : b hang
that actually wont work will it? It will get you from arm to thumb but not from thumb to arm. I will leave that as an exercise for the reader.
If you change start.s to this
.thumb
.globl _start
_start:
bl hello
hang : b hang
the linker takes care of us:
00001000 <_start>:
1000: f000 f820 bl 1044 <__hello_from_thumb>
00001004 <hang>:
1004: e7fe b.n 1004 <hang>
...
00001044 <__hello_from_thumb>:
1044: 4778 bx pc
1046: 46c0 nop ; (mov r8, r8)
1048: eaffffee b 1008 <hello>
I would and do always disassemble programs like these to make sure the compiler and linker resolved these issues. Also note that for example __hello_from_thumb can be used from any thumb function, if I call hello from several places, some arm, some thumb, and hello was compiled for arm, then the arm calls would call hello directly (if they can reach it) and all the thumb calls would share the same hello_from_thumb (if they can reach it).
The compiler in these examples was assuming code that stays in the same mode (simple branch link) and the linker added the interworking code...
If you really meant inter-networking and not interworking, then please describe what that is and I will delete this answer.
EDIT:
You were using a register to preserve lr during the call to Double, that will not work, no register will work for that you need to use memory, and the easiest is the stack. See how the compiler does it:
00001008 <hello>:
1008: e92d4008 push {r3, lr}
100c: eb000009 bl 1038 <__two_from_arm>
1010: e8bd4008 pop {r3, lr}
1014: e2800007 add r0, r0, #7
1018: e12fff1e bx lr
r3 is pushed likely to align the stack on a 64 bit boundary (makes it faster). the thing to notice is the link register is preserved on the stack, but the pop does not pop to pc because this is not an ARMv4 build, so a bx is needed to return from the function. Because this is arm mode we can pop to lr and simply bx lr.
For thumb you can only push r0-r7 and lr directly and pop r0-r7 and pc directly you dont want to pop to pc because that only works if you are staying in the same mode (thumb or arm). this is fine for a cortex-m, or fine if you know what all of your callers are, but in general bad. So
00001024 <two>:
1024: b508 push {r3, lr}
1026: f000 f811 bl 104c <__three_from_thumb>
102a: 3005 adds r0, #5
102c: bc08 pop {r3}
102e: bc02 pop {r1}
1030: 4708 bx r1
same deal r3 is used as a dummy register to keep the stack aligned for performance (I used the default build for gcc 4.8.0 which is likely a platform with a 64 bit axi bus, specifying the architecture might remove that extra register). Because we cannot pop pc, I assume because r1 and r3 would be out of order and r3 was chosen (they could have chosen r2 and saved an instruction) there are two pops, one to get rid of the dummy value on the stack and the other to put the return value in a register so that they can bx to it to return.
Your Start function does not conform to the ABI and as a result when you mix it in with such large libraries as a printf call, no doubt you will crash. If you didnt it was dumb luck. Your assembly listing of main shows that neither r4 nor r10 were used and assuming main() is not called other than the bootstrap, then that is why you got away with either r4 or r10.
If this really is an LPC1769 this this whole discussion is irrelevant as it does not support ARM and does not support interworking (interworking = mixing of ARM mode code and thumb mode code). Your problem was unrelated to interworking, you are not interworking (note the pop {pc} at the end of the functions). Your problem was likely related to your assembly code.
EDIT2:
Changing the makefile to specify the cortex-m
00001008 <hello>:
1008: b508 push {r3, lr}
100a: f000 f805 bl 1018 <two>
100e: 3007 adds r0, #7
1010: bd08 pop {r3, pc}
1012: 46c0 nop ; (mov r8, r8)
00001014 <three>:
1014: 3003 adds r0, #3
1016: 4770 bx lr
00001018 <two>:
1018: b508 push {r3, lr}
101a: f7ff fffb bl 1014 <three>
101e: 3005 adds r0, #5
1020: bd08 pop {r3, pc}
1022: 46c0 nop ; (mov r8, r8)
first and foremost it is all thumb since there is no arm mode on a cortex-m, second the bx is not needed for function returns (Because there are no arm/thumb mode changes). So pop {pc} will work.
it is curious that the dummy register is still used on a push, I tried an arm7tdmi/armv4t build and it still did that, so there is some other flag to use to get rid of that behavior.
If your desire was to learn how to make an assembly function that you can call from C, you should have just done that. Make a C function that somewhat resembles the framework of the function you want to create in asm:
extern unsigned int Double ( unsigned int );
unsigned int Start ( void )
{
return(Double(42));
}
assemble then disassemble
00000000 <Start>:
0: b508 push {r3, lr}
2: 202a movs r0, #42 ; 0x2a
4: f7ff fffe bl 0 <Double>
8: bd08 pop {r3, pc}
a: 46c0 nop ; (mov r8, r8)
and start with that as you assembly function.
.globl Start
.thumb_func
Start:
push {lr}
mov r0, #42
bl Double
pop {pc}
That, or read the arm abi for gcc and understand what registers you can and cant use without saving them on the stack, what registers are used for passing and returning parameters.
_start function is the entry point of a C program which makes a call to main(). In order to debug _start function of any C program is in the assembly file. Actually the real entry point of a program on linux is not main(), but rather a function called _start(). The standard libraries normally provide a version of this that runs some initialization code, then calls main().
Try compiling this with gcc -nostdlib:

Local variable location in memory

For a homework assignment I have been given some c files, and compiled them using arm-linux-gcc (we will eventually be targeting gumstix boards, but for these exercises we have been working with qemu and ema).
One of the questions confuses me a bit-- we are told to:
Use arm-linux-objdump to find the location of variables declared in main() in the executable binary.
However, these variables are local and thus shouldn't have addresses until runtime, correct?
I'm thinking that maybe what I need to find is the offset in the stack frame, which can in fact be found using objdump (not that I know how).
Anyways, any insight into the matter would be greatly appreciated, and I would be happy to post the source code if necessary.
unsigned int one ( unsigned int, unsigned int );
unsigned int two ( unsigned int, unsigned int );
unsigned int myfun ( unsigned int x, unsigned int y, unsigned int z )
{
unsigned int a,b;
a=one(x,y);
b=two(a,z);
return(a+b);
}
compile and disassemble
arm-none-eabi-gcc -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
code created by compiler
00000000 <myfun>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: e24dd018 sub sp, sp, #24
c: e50b0010 str r0, [fp, #-16]
10: e50b1014 str r1, [fp, #-20]
14: e50b2018 str r2, [fp, #-24]
18: e51b0010 ldr r0, [fp, #-16]
1c: e51b1014 ldr r1, [fp, #-20]
20: ebfffffe bl 0 <one>
24: e50b0008 str r0, [fp, #-8]
28: e51b0008 ldr r0, [fp, #-8]
2c: e51b1018 ldr r1, [fp, #-24]
30: ebfffffe bl 0 <two>
34: e50b000c str r0, [fp, #-12]
38: e51b2008 ldr r2, [fp, #-8]
3c: e51b300c ldr r3, [fp, #-12]
40: e0823003 add r3, r2, r3
44: e1a00003 mov r0, r3
48: e24bd004 sub sp, fp, #4
4c: e8bd4800 pop {fp, lr}
50: e12fff1e bx lr
Short answer is the memory is "allocated" both at compile time and at run time. At compile time in the sense that the compiler at compile time determines the size of the stack frame and who goes where. Run time in the sense that the memory itself is on the stack which is a dynamic thing. The stack frame is taken from stack memory at run time, almost like a malloc() and free().
It helps to know the calling convention, x enters in r0, y in r1, z in r2. then x has its home at fp-16, y at fp-20, and z at fp-24. then the call to one() needs x and y so it pulls those from the stack (x and y). the result of one() goes into a which is saved at fp-8 so that is the home for a. and so on.
the function one is not really at address 0, this is a disassembly of an object file not a linked binary. once an object is linked in with the rest of the objects and libraries, the missing parts, like where external functions are, are patched in by the linker and the calls to one() and two() will get real addresses. (and the program will likely not start at address 0).
I cheated here a little, I knew that with no optimizations enabled on the compiler and a relatively simple function like this there really is no reason for a stack frame:
compile with just a little optimization
arm-none-eabi-gcc -O1 -c fun.c -o fun.o
arm-none-eabi-objdump -D fun.o
and the stack frame is gone, the local variables remain in registers.
00000000 :
0: e92d4038 push {r3, r4, r5, lr}
4: e1a05002 mov r5, r2
8: ebfffffe bl 0
c: e1a04000 mov r4, r0
10: e1a01005 mov r1, r5
14: ebfffffe bl 0
18: e0800004 add r0, r0, r4
1c: e8bd4038 pop {r3, r4, r5, lr}
20: e12fff1e bx lr
what the compiler decided to do instead is give itself more registers to work with by saving them on the stack. Why it saved r3 is a mystery, but that is another topic...
entering the function r0 = x, r1 = y and r2 = z per the calling convention, we can leave r0 and r1 alone (try again with one(y,x) and see what happens) since they drop right into one() and are never used again. The calling convention says that r0-r3 can be destroyed by a function, so we need to preserve z for later so we save it in r5. The result of one() is r0 per the calling convention, since two() can destroy r0-r3 we need to save a for later, after the call to two() also we need r0 for the call to two anyway, so r4 now holds a. We saved z in r5 (was in r2 moved to r5) before the call to one, we need the result of one() as the first parameter to two(), and it is already there, we need z as the second so we move r5 where we had saved z to r1, then we call two(). the result of two() per the calling convention. Since b + a = a + b from basic math properties the final add before returning is r0 + r4 which is b + a, and the result goes in r0 which is the register used to return something from a function, per the convention. clean up the stack and restore the modified registers, done.
Since myfun() made calls to other functions using bl, bl modifies the link register (r14), in order to be able to return from myfun() we need the value in the link register to be preserved from the entry into the function to the final return (bx lr), so lr is pushed on the stack. The convention states that we can destroy r0-r3 in our function but not other registers so r4 and r5 are pushed on the stack because we used them. why r3 is pushed on the stack is not necessary from a calling convention perspective, I wonder if it was done in anticipation of a 64 bit memory system, making two full 64 bit writes is cheaper than one 64 bit write and one 32 bit right. but you would need to know the alignment of the stack going in so that is just a theory. There is no reason to preserve r3 in this code.
Now take this knowledge and disassemble the code assigned (arm-...-objdump -D something.something) and do the same kind of analysis. particularly with functions named main() vs functions not named main (I did not use main() on purpose) the stack frame can be a size that doesnt make sense, or less sense than other functions. In the non optimized case above we needed to store 6 things total, x,y,z,a,b and the link register 6*4 = 24 bytes which resulted in sub sp, sp, #24, I need to think about the stack pointer vs frame pointer
thing for a bit. I think there is a command line argument to tell the compiler not to use a frame pointer. -fomit-frame-pointer and it saves a couple of instructions
00000000 <myfun>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e24dd01c sub sp, sp, #28
8: e58d000c str r0, [sp, #12]
c: e58d1008 str r1, [sp, #8]
10: e58d2004 str r2, [sp, #4]
14: e59d000c ldr r0, [sp, #12]
18: e59d1008 ldr r1, [sp, #8]
1c: ebfffffe bl 0 <one>
20: e58d0014 str r0, [sp, #20]
24: e59d0014 ldr r0, [sp, #20]
28: e59d1004 ldr r1, [sp, #4]
2c: ebfffffe bl 0 <two>
30: e58d0010 str r0, [sp, #16]
34: e59d2014 ldr r2, [sp, #20]
38: e59d3010 ldr r3, [sp, #16]
3c: e0823003 add r3, r2, r3
40: e1a00003 mov r0, r3
44: e28dd01c add sp, sp, #28
48: e49de004 pop {lr} ; (ldr lr, [sp], #4)
4c: e12fff1e bx lr
optimizing saves a whole lot more though...
It's going to depend on the program and how exactly they want the location of the variables. Does the question want what code section they're stored in? .const .bss etc? Does it want specific addresses? Either way a good start is using objdump -S flag
objdump -S myprogram > dump.txt
This is nice because it will print out an intermixing of your source code and the assembly with addresses. From here just do a search for your int main and that should get you started.

Resources