I have a question about dynamic linking on Linux. Consider the following disassembly of an ARM binary.
8300 <printf#plt-0x40>:
....
8320: e28fc600 add ip, pc, #0, 12
8324: e28cca08 add ip, ip, #8, 20 ; 0x8000
8328: e5bcf344 ldr pc, [ip, #836]! ; 0x344
....
83fc <main>:
...
8424:ebffffbd bl 8320 <_init+0x2c>
Main function calls printf at 8424: bl 8320. 8320 is an address in the .plt shown above. Now the code in .plt makes call to dynamic linker to invoke printf routine. My question is how the dynamic linker will be able to say that it is a call to printf?
TLDR; The PLT calls the dynamic linker by passing:
the address of the GOT entry in IP (&PLTGOT[n+3]);
&PLTGOT[2] is in LR;
Moreover PLTGOT[1] identifies the shared-object/executable.
The dynamic linker use this to find the relocation entry (plt_relocation_table[n]) and thus the symbol (printf).
Explanation of the PLT entry code
This is explained (somehow) in section A.3 of ELF for ARM:
8320: e28fc600 add ip, pc, #0, 12
8324: e28cca08 add ip, ip, #8, 20 ; 0x8000
8328: e5bcf344 ldr pc, [ip, #836]! ; 0x344
Which are explained by:
ADD ip, pc, #-8:PC_OFFSET_27_20:__PLTGOT(X)
; R_ARM_ALU_PC_G0_NC(__PLTGOT(X))
ADD ip, ip, #-4:PC_OFFSET_19_12: __PLTGOT(X)
;R_ARM_ALU_PC_G1_NC(__PLTGOT(X))
LDR pc, [ip, #0:PC_OFFSET_11_0:__PLTGOT(X)]!
; R_ARM_LDR_PC_G2(__PLTGOT(X))
Those instructions do two things:
they compute the address of the GOT entry as an offset from PC and store it in the IP register;
they jump to this GOT entry.
The spec notes that:
The write-back on the final LDR ensures that ip contains
the address of the PLTGOT entry. This is critical to
incremental dynamic linking.
The "write-back" is the use of "!" in the last instruction: this is used to update IP register with the final offset (#836). This way IP contains the addess of the GOT entry at the end of the PLT entry.
The dynamic linker has the address of the GOT entry in IP:
it can find the shared-object or executable;
it can find the correct relocation entry.
This relocation entry references the symbol of target function (printf in your case):
Offset Info Type Sym. Value Sym. Name
0001066c 00000116 R_ARM_JUMP_SLOT 00000000 printf
The Base Platform ABI for the ARM architecture notes that:
When the platform supports lazy function binding (as ARM Linux does)
this ABI requires ip to address the corresponding
PLTGOT entry at the point where the PLT calls through it.
(The PLT is requir ed to behave as if it ended with LDR pc, [ip]).
Finding the relocation entry from the GOT
Now the way the relocation entry is found from the GOT address is not clear. Binary search could be used but is would not be convenient. The GNU ld.so does it like this (glibc/sysdeps/arm/dl-trampoline.S):
dl_runtime_resolve:
cfi_adjust_cfa_offset (4)
cfi_rel_offset (lr, 0)
# we get called with
# stack[0] contains the return address from this call
# ip contains &GOT[n+3] (pointer to function)
# lr points to &GOT[2]
# Save arguments. We save r4 to realign the stack.
push {r0-r4}
cfi_adjust_cfa_offset (20)
cfi_rel_offset (r0, 0)
cfi_rel_offset (r1, 4)
cfi_rel_offset (r2, 8)
cfi_rel_offset (r3, 12)
# get pointer to linker struct
ldr r0, [lr, #-4]
# prepare to call _dl_fixup()
# change &GOT[n+3] into 8*n NOTE: reloc are 8 bytes each
sub r1, ip, lr
sub r1, r1, #4
add r1, r1, r1
[...]
The address of the second GOT entry is in LR. I guess this is donebyt .PLT0:
00015b84 :
15b84: e52de004 push {lr} ; (str lr, [sp, #-4]!)
15b88: e59fe004 ldr lr, [pc, #4] ; 15b94
15b8c: e08fe00e add lr, pc, lr
15b90: e5bef008 ldr pc, [lr, #8]!
15b94: 0012f46c andseq pc, r2, ip, ror #8
From those two GOT addresses, the dynamic linker can find the GOT offset and the offset in the PLT relocation table.
From &GOT[2], the dynamic linker can find the second entry of the PLTGOT (GOT[1]) which contains the address of the linker struct (a reference used by the dynamic linker to recosgnise this shared-object/executable).
I don't where this is specified: it does not seem to be part of the base ARM ABI spec.
.rela.plt contains the address of printf to inform the dynamic linker from where to locate the printf
check this link for details very soft to digest https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html. This article also clarify about process of variables to be accessed through Shared libraries first and then functions.
The process of dynamic linking is described in great detail here.
TL;DR: at static link time, ld creates a set of tables in special sections such as .rel.dyn, .rel.plt, etc., which tell the runtime loader what to do at runtime.
You can examine these tables with nm -D, readelf -Wr, objdump -R, etc.
Related
I have a C code like this one, that will be possibly compiled in an ELF file for ARM:
int a;
int b=1;
int foo(int x) {
int c=2;
static float d=1.5;
// .......
}
I know that all the executable code goes into the .text section, while .data , .bss and .rodata will contain the various variables/constants.
My question is: does a line like int b=1; here add also something to the .text section, or does it only tell the compiler to place a new variable initialized to 1 in .data (then probably mapped in RAM memory when deployed on the final hardware)?
Moreover, trying to decompile a similar code, I noticed that a line such as int c=2;, inside the function foo(), was adding something to the stack, but also some lines of .text where the value '2' was actually memorized there.
So, in general, does a declaration always imply also something added to .text at an assembly level? If yes, does it depends on the context (i.e. if the variable is inside a function, if it is a local global variable, ...) and what is actually added?
Thanks a lot in advance.
does a line like int b=1; here add also something to the .text section, or does it only tell the compiler to place a new variable initialized to 1 in .data (then probably mapped in RAM memory when deployed on the final hardware)?
You understand that this is likely to be implementation specific, but the likelihood is that that you will just get initialised data in the data section. Were it a constant, it might, instead go into the text section.
Moreover, trying to decompile a similar code, I noticed that a line such as int c=2;, inside the function foo(), was adding something to the stack, but also some lines of .text where the value '2' was actually memorized there.
Automatic variables that are initialised, have to be initialised each time the function's scope is entered. The space for c is reserved on the stack (or in a register, depending on the ABI) but the program has to remember the constant from which it is initialised and this is best placed somewhere in the text segment, either as a constant value or as a "move immediate" instruction.
So, in general, does a declaration always imply also something added to .text at an assembly level?
No. If a static variable is initialised to zero or null or not initialised at all, it is often just enough to reserve space in bss. If a static non constant variable is initialised to a non zero value, it will just be put in the data segment.
As #goodvibration correctly stated, only global or static variables go to the segments. This is because their lifetime is the whole execution time of the program.
Local variables have a different lifetime. They exist only during the execution of the block (e.g. function) they are defined within. If a function is called, all parameters that does not fit into registers a pushed to the stack and the return address is written to the link register.* The function saves possibly the link register and other registers at the stack and adds some space at the stack for local variables (this is the code you have observed). At the end of the function, the saved registers are poped and the the stackpointer is readjusted. In this way, you get an automatic garbage collection for local variables.
*: Please note, that this is true for (some calling conventions of) ARM only. It's different e.g. for Intel processors.
this is one of those just try it things.
int a;
int b=1;
int foo(int x) {
int c=2;
static float d=1.5;
int e;
e=x+2;
return(e);
}
first thing without optimization.
arm-none-eabi-gcc -c so.c -o so.o
arm-none-eabi-objdump -D so.o
arm-none-eabi-ld -Ttext=0x1000 -Tdata=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf > so.list
do worry about the warning, needed to link to see that everything found a home
Disassembly of section .text:
00001000 <foo>:
1000: e52db004 push {r11} ; (str r11, [sp, #-4]!)
1004: e28db000 add r11, sp, #0
1008: e24dd014 sub sp, sp, #20
100c: e50b0010 str r0, [r11, #-16]
1010: e3a03002 mov r3, #2
1014: e50b3008 str r3, [r11, #-8]
1018: e51b3010 ldr r3, [r11, #-16]
101c: e2833002 add r3, r3, #2
1020: e50b300c str r3, [r11, #-12]
1024: e51b300c ldr r3, [r11, #-12]
1028: e1a00003 mov r0, r3
102c: e28bd000 add sp, r11, #0
1030: e49db004 pop {r11} ; (ldr r11, [sp], #4)
1034: e12fff1e bx lr
Disassembly of section .data:
00002000 <b>:
2000: 00000001 andeq r0, r0, r1
00002004 <d.4102>:
2004: 3fc00000 svccc 0x00c00000
Disassembly of section .bss:
00002008 <a>:
2008: 00000000 andeq r0, r0, r0
as a disassembly it tries to disassemble data so ignore that (the andeq next to 0x2008 for example).
The a variable is global and uninitialized so it lands in .bss (typically...a compiler can choose to do whatever it wants so long as it implements the language correctly, doesnt have to have something called .bss for example, but gnu and many others do).
b is global and initialized so it lands in .data, had it been declared as const it might land in .rodata depending on the compiler and what it offers.
c is a local non-static variable that is initialized, because C offers recursion this needs to be on the stack (or managed with registers or other volatile resources), and initialized each run. We needed to compile without optimization to see this
1010: e3a03002 mov r3, #2
1014: e50b3008 str r3, [r11, #-8]
d is what I call a local global, it is a static local so it lives outside the function, not on the stack, alongside the globals but with local access only.
I added e to your example, this is a local not initialized, but then used. Had I not used it and not optimized there probably would have been space allocated for it but no initialization.
save x on the stack (per this calling convention x enters in r0)
100c: e50b0010 str r0, [r11, #-16]
then load x from the stack, add two, save as e on the stack. read e from
the stack and place in the return location for this calling convention which is r0.
1018: e51b3010 ldr r3, [r11, #-16]
101c: e2833002 add r3, r3, #2
1020: e50b300c str r3, [r11, #-12]
1024: e51b300c ldr r3, [r11, #-12]
1028: e1a00003 mov r0, r3
For all architectures, unoptimized this is somewhat typical, always read variables from the stack and put them back quickly. Other architectures have different calling conventions with respect to where the incoming parameters and outgoing return value live.
If I optmize (-O2 on the gcc line)
Disassembly of section .text:
00001000 <foo>:
1000: e2800002 add r0, r0, #2
1004: e12fff1e bx lr
Disassembly of section .data:
00002000 <b>:
2000: 00000001 andeq r0, r0, r1
Disassembly of section .bss:
00002004 <a>:
2004: 00000000 andeq r0, r0, r0
b is a global, so at the object level a global space has to be reserved for it, it is .data, optimization doesnt change that.
a is also global and still .bss, because at the object level it was declared such so allocated in case another object needs it. The linker doesnt remove these.
Now c and d are dead code they dont do anything they need no storage so
c is no longer allocated space on the stack nor is d allocated any .data
space.
We have plenty of registers for this architecture for this calling convention for this code, so e does not need any memory allocated on the
stack, it comes in in r0 the math can be done with r0 and then it is returned in r0.
I know I didnt tell the linker where to put .bss by telling it .data it put .bss in the same space without complaint. I could have put -Tbss=0x3000 for example to give it its own space or just done a linker script. Linker scripts can play havoc with the typical results, so beware.
Typical, but there might be a compiler with exceptions:
non-constant globals go in .data or .bss depending on whether they are initialized during the declaration or not.
If const then perhaps .rodata or .text depending (or .data or .bss would technically work)
non-static locals go in general purpose registers or on the stack as needed (if not completely optimized away).
static locals (if not optimized away) live with globals but are not globally accessible they just get allocated space in .data or .bss like the globals do.
parameters are governed completely by the calling convention used by that compiler for that target. Just because arm or mips or other may have written down a convention doesnt mean a compiler has to use it, only if they claim to support some convention or standard should they then attempt to comply. For a compiler to be useful it needs a convention and stick to it whatever it is, so that both caller and callee of a function know where to get parameters and to return a value. Architectures with enough registers will often have a convention where some few number of registers are used for the first so many parameters (not necessarily one to one) and then the stack is used for all other parameters. likewise a register may be used if possible for a return value. Some architectures due to lack of gprs or other, use the stack in both directions. or the stack in one and a register in the other. You are welcome to seek out the conventions and try to read them, but at the end of the day the compiler you are using, if not broken follows a convention and by setting up experiments like the one above you can see the convention in action.
Plus in this case optimizations.
void more_fun ( unsigned long long );
unsigned fun ( unsigned int x, unsigned long long y )
{
more_fun(y);
return(x+1);
}
If I told you that arm conventions typically use r0-r3 for the first few parameters you might assume that x is in r0 and r1 and r2 are used for y and we could have another small parameter before needing the stack, well
perhaps older arm, but now it wants the 64 bit variable to use an even then an odd.
00000000 <fun>:
0: e92d4010 push {r4, lr}
4: e1a04000 mov r4, r0
8: e1a01003 mov r1, r3
c: e1a00002 mov r0, r2
10: ebfffffe bl 0 <more_fun>
14: e2840001 add r0, r4, #1
18: e8bd4010 pop {r4, lr}
1c: e12fff1e bx lr
so r0 contains x, r2/r3 contain y and r1 was passed over.
the test was crafted to not have y as dead code and to pass it to another function we can see where y was stored on the way into fun and way out to more_fun. r2/r3 on the way in, needs to be in r0/r1 to call more fun.
we need to preserve x for the return from fun. one might expect that x would land on the stack, which unoptimized it would, but instead save a register that the convention has stated will be preserved by functions (r4) and use r4 throughout the function or at least in this function to store x. A performance optimization, if x needed to be touched more than once memory cycles going to the stack cost more than register accesses.
then it computes the return and cleans up the stack, registers.
IMO it is important to see this, the calling convention comes into play for some variables and others can vary based on optimization, no optimization they are what most folks are going to state off hand, .bss, .data (.text/.rodata), with optimization then it depends if if the variable survives at all.
I am currently trying to develop my own bootloader for an Atmel SAM R21.
My idea is to run the bootloader firstly, so it will decide if an update is needed to be performed or just jumping to the application. The main problem is that the Interrupt Vector Table is located at the 0x0000_0000 address, so it needs to be relocated just before the application code, so if the bootloader has a 8KB space set in the linker file and using the BOOTPROT fuse in that way (setting this fuse it is supposed that there will be some protection to the amount of memory selected through the fuse), the vector table should start at the 0x0000_2000 address.
In order to relocate the vector table I pretend to use the VTOR register, which is an offset applied to the original table address (0x0000_0000).
The assembly code is the following:
asm(" LDR R0,=0xE000ED08 "); //VTOR ADDRESS
asm("LDR R1,=0x00002000"); //OFFSET
asm(" STR R1, [R0]");
asm(" LDR R0,[R1] ");
asm(" MOV SP, R0");
asm(" LDR R0,[R1, #4]");
asm(" BX R0");
LDR instruction gives me the following error:
Error[Og006]: Syntax error in inline assembly: "Error[401]: Operand syntax error"
What am I doing wrong? Maybe I am trying to use ARM instruction instead of a Thumb one?
I will very appreciate any advise.
I am also doubting if once I get the Interrup Vector Table relocated, should I count with the Initial MSP value also? I want to mean, if the Interrupt Vector table starts at address 0x0000_2000 after being relocated, I should count 4(bytes) * Interrupt in order to know which should be the initial application address, shouldn't I? If someone knows something about this it would be nice. I know I am close (or I think so), but I need to clarify those points.
Edited 27/06/16 at 13:04.
This instruction works LDR R0,[R1]
So I guess it is something related to receive the 32 bits address into the register, but I don't understand why it is complaining about this.
SOLUTION:
As an answer to my question, someone posted that not all assembly directives can be used inlined, so I needed to create an assembler file, my_file.s
In this file should be created a function to be called from outside, something like this:
#define _PORT_ASM_ARM_SRC
#define __ASSEMBLY__
;/****************************************************************************
;** **
;** ASSEMBLY FUNCTIONS **
;** **
;****************************************************************************/
NAME start_app
RSEG CODE:CODE(2)
THUMB
PUBLIC jump_to_app
;/***************************************************************************/
;/***************************************************************************/
;/* jump_to_app()
; * Jump to application function.
; */
jump_to_app:
LDR R0,=0xE000ED08 ; Set R0 to VTOR address
LDR R1,=0x00010000 ; User’s flash memory based address
STR R1, [R0] ; Define beginning of user’s flash memory as vector table
LDR R0,[R1] ; Load initial MSP value
MOV SP, R0 ; Set SP value (assume MSP is selected)
LDR R0,[R1, #4] ; Load reset vector
BX R0 ; Branch to reset handler in user’s flash
END
After doing this, the function prototipe should be included into a .h file of your project as a normal function, using something like this:
void jump_to_app(void);
Best regards,
Iván.
There is nothing syntactically wrong with the assembly code above. If you put the assembly code into an asm file and assemble it, it will build (as to whether it does what you intended I have not checked).
For some reason the inline assembler does not like LDR Rd, =expr.
See the following quote from the IAR Embedded Workbench Help:
The pseudo-instruction LDR Rd, =expr is not available from inline assembler
Also from ARM:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472j/chr1359124248868.html
SOLUTION:
As an answer to my question, someone posted that not all assembly directives can be used inlined, so I needed to create an assembler file, my_file.s In this file should be created a function to be called from outside, something like this:
#define _PORT_ASM_ARM_SRC
#define __ASSEMBLY__
;/****************************************************************************
;** **
;** ASSEMBLY FUNCTIONS **
;** **
;****************************************************************************/
NAME start_app
RSEG CODE:CODE(2)
THUMB
PUBLIC jump_to_app
;/***************************************************************************/
;/***************************************************************************/
;/* jump_to_app()
; * Jump to application function.
; */
jump_to_app:
LDR R0,=0xE000ED08 ; Set R0 to VTOR address
LDR R1,=0x00010000 ; User’s flash memory based address
STR R1, [R0] ; Define beginning of user’s flash memory as vector table
LDR R0,[R1] ; Load initial MSP value
MOV SP, R0 ; Set SP value (assume MSP is selected)
LDR R0,[R1, #4] ; Load reset vector
BX R0 ; Branch to reset handler in user’s flash
END
After doing this, the function prototipe should be included into a .h file of your project as a normal function, using something like this:
void jump_to_app(void);
Best regards,
Iván.
I am a little bit confused with ENTRY and STARTUP commands in GCC linker script.
As stated here: http://wiki.osdev.org/Linker_Scripts
ENTRY() makes any symbol to be linked as first item in .text section.
STARTUP() on the other hand makes whole compiled file to be placed as first item in .text section.
In my project however it behaves strange.
I am using gnu-arm-none-eabi toolchain and in my linker script command ENTRY(asm_start) makes no effect. Linker script:
ENTRY(asm_start)
MEMORY
{
RAM : ORIGIN = 0x10000, LENGTH = 0x1000000
PIC_BUFF : ORIGIN = 0x10000 + LENGTH(RAM), LENGTH = 200M
}
SECTIONS
{
.text : {*(.text)} > RAM
.data : {*(.data)} > PIC_BUFF
// etc.
assembly function:
.text
.global asm_start
.global exc_stack
.global supervisor_sp
asm_start:
# initialize Stack pointer for exception modes
mrs r4, cpsr
bic r4, r4, #0x1f
#FIQ Mode
orr r3, r4, #0x11
msr cpsr_c, r3
ldr sp, =exc_stack
#IRQ Mode
orr r3, r4, #0x12
msr cpsr_c, r3
// etc.
and asm_start finishes in some random place in memory.
On the other hand STARTUP() function works fine and desired file ends in proper place in .text section.
Could some please explain what exactly is happening in this case?
ENTRY() sets the PC starting point on the execution, and STARTUP() the first linked object in the .text
BTW, for baremental cortex ARM gnu-arm-none-eabi, ENTRY() is usually set to the Reset_Handler vector and .text must start with the interrupt vector table.
I've always seen it manually set. but it might be possible to use STARTUP() for that.
I am trying to call printf in ARM M4 assembly and meet some problems. The purpose is to dump content in R1. The code is like the following
.data
.balign 4
output_string:
dcb "content in R1 is 0x%x\n", 0
....
.text
....
push {r0, r1}
mov r1, r0
ldr r0, =output_string
bl printf
pop {r0, r1}
The problem I meet is that, when put "output_string" address into R0, the value is added with a extra 1. For example, if the symbol "output_string" have a value of 0x2000, R0 will get the value 0x2001.
I feel this has something to do with THUMB/ARM mode. But I have declare "output_string" in data section, why the assembler still translate it as an instruction address?
Or is there some more formal way to do such in-assembly function calling?
I think you should use:
ldr r0, =output_string
The = prefix is an assembler shorthand to make it load an arbitrary 32-bit constant. See this ARM Information Center page.
I read a lot of topics on this forum and found a lot of answers on this subject. I achieved to pass 5 arguments to a C function from my assembly code. For doing this, i used the instructions below :
mov r0, #0
mov r1, #1
mov r2, #2
mov r3, #3
mov r4, #4
STR r4, [sp, #-4]!
BL displayRegistersValue
But today i'm trying to pass the whole registers to a C function to save them in a C structure. I tried with this instruction :
STMDB sp!, {registers that i want to save}
My C function :
displayRegistersValue(int registers[number_of_registers])
char printable = registers[0] + (int)'0'; // Convert in a printable character
print_uart0(&printable);
But my display is not good. So, how I can access to the registers in C code?
Pretty sure the ARM standard only allows R0-R3 to be passed by value so 4 max. If you need more values, then push them onto the stack and access them that way - like the compiler does. Or make a struct and pass its address.
Ok, doubled cheked and I was right here is a link to the ARM calling conventions - down the page a bit.
To do what you want, pass the address of some memory location (an array) into your assembly routine. Once you have that address, probably within r0, you can stmdb! into that location all your register values and that memory will be viewable at the C level.
Beware, this probably isn't going to do what you think it will. Those values are allowed to change quite a bit as per the calling convention link above. If this is for debugging, you are better off using a debugger and watching the registers that way.
Ok, you are still not understanding here:
{
int registerValues[14];
myAsmRoutine(registerValues);
print_uart0(& registerValues);
}
myAsmRoutine:
stmia r0!, {r1-r14}
blx lr
I skipped R0 and PC, but you get the idea. Also, you will need to do something a bit mroe complex to change the values into a printable format - sprintf or itoa os something like that.
displayRegistersValue(int registers[number_of_registers])
this is an array not a structure and is passed as a pointer to something not as a long list of items. same goes for structures btw.
It is usually easiest to construct a C function that does what you want in asm then see what the compiler produces, then go from there (use the ABI document to confirm, etc).
#define NUMREGS 13
void displayRegistersValue(unsigned int registers[NUMREGS]);
void outer ( void )
{
unsigned int regs[NUMREGS];
displayRegistersValue(regs);
}
> arm-none-linux-gnueabi-gcc -O2 -c fun.c -o fun.o
> arm-none-linux-gnueabi-objdump -D fun.o
fun.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <outer>:
0: e52de004 push {lr} ; (str lr, [sp, #-4]!)
4: e24dd03c sub sp, sp, #60 ; 0x3c
8: e28d0004 add r0, sp, #4
c: ebfffffe bl 0 <displayRegistersValue>
10: e28dd03c add sp, sp, #60 ; 0x3c
14: e49df004 pop {pc} ; (ldr pc, [sp], #4)
You will need to do something similar, make room on the stack by adding to the stack pointer, save the lr so you dont trash it with the branch link, copy your registers to that memory (the stack) point r0 to the beginning of the memory/array you want to pass, then call the function (r0 being the first and only parameter you are passing to the function).
push {lr}
mov lr,sp
stmdb sp!,{r0-r12}
mov r0,lr
bl displayRegistersValue
add sp,sp,#52
pop {lr}
An array is passed as a pointer in a single register. If you want 5 registers then you need to have 5 parameters (int i1, int i2 etc.).
To quote from the ARM APCS document:
"The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls)."
So if you want to pass more than 4 values to a C function, you need to pass the rest of the values on the stack. A better idea would be to put the register values in a memory region that has been statically allocated and pass the address of the memory (pointer) to the C function. The pointer can be de-referenced by the function to get to the register values.