How can I use external defines such as LONG_MIN and LONG_MAX in ARM assembler code?
Let's say my_arm.h looks like this:
int my_arm(int foo);
Let's say I have a my_main.c as follows:
...
#include <limits.h>
#include "my_arm.h"
...
int main (int argc, char *argv[])
{
int foo=0;
...
printf("My arm assembler function returns (%d)\n", my_arm(foo));
...
}
And my_arm.s looks like this:
.text
.align 2
.global my_arm
.type my_arm, %function
my_arm:
...
ADDS r1, r1, r2
BVS overflow
...
overflow:
LDR r0, LONG_MAX # this is probably wrong, how to do it correctly?
BX lr # return with max value
The second to last line, I am not sure how to load correctly, I vaguely remember reading somewhere, that I had to define LONG_MAX in .global, but can't find the link to a working example anymore.
I am compiling with arm-linux-gnueabi-gcc version 4.3.2
==================
UPDATE: Appreciate the suggestions! Unfortunately, I am still having trouble with syntax.
First, I made a little header file mylimits.h (for now in same dir as .S)
#define MY_LONG_MIN 0x80000000
in my_arm.S i added the following:
...
.include "mylimits.h"
...
ldr r7, =MY_LONG_MIN # when it was working it was ldr r7, =0x80000000
...
Two problems with this approach.
First the biggest problem: the symbol MY_LONG_MIN is not recognized...so something is still not right
Second: syntax for .include does not let me include <limits.h>, I would have to add that in mylimits.h, seems a bit kludgy, but I suppose, that is ok :)
Any pointers?
I have access to ARM System Developer’s Guide Designing and Optimizing System Software[2004] and ARM Architecture
Reference Manual[2000], my target is XScale-IXP42x Family rev 2 (v5l) though.
Often the lowercase file extension .s implies that assembler should not be passed through the c preprocessor, whereas the uppercase extension .S implies that it should.
It's up to your compiler to follow this convention though (gcc ports normally do), so check its documentation.
(EDIT: note that this means you can use #include directives - but remember that most of the files you would include would not normally be valid assembler (unless they consist entirely of #defineitions), so you may have to write your own header that is)
edit 5 years later:
Note that the armcc v5 compiler follows this behaviour under linux... but not on windows.
If you are using gcc and its assembler, it is straightforward: name the file with final .S, then add at the beginning #include <limits.h> and use wherever you need the constant, e.g. ldr r0, SOMETHING; I did tests with x86 since it is what I have, but the same works since it is a gcc feature.
What I ended up doing is this:
in my_main.c
#include <limits.h>
...
int my_LONG_MAX=LONG_MAX;
then in my_arm.S
ldr r8, =my_LONG_MAX
ldr r10, [r8]
It looks convuluted and it is(plus the portability gains are questionable in this approach).
There must be a way to access LONG_MAX directly in assembly. Such a way I would gladly accept as the full answer.
I have seen simply feeding gcc the assembler source vs gas will allow you to do C like things in assembler. It is actually a bit scary when you come across situations where you must use gcc as a front end to gas to get something to work, but that is another story.
use --cpreproc for armasm option and add
#include "my_arm.h"
into my_arm.s.
it works for Keil ARM
Related
I was wondering what the actual meaning of # zero_extendqisi2 in gcc assembly output was and also the usage. I couldn't find what qisi stands for or anything along those lines.
For context, the line is ldrb r3, [fp, #-9] # zero_extendqisi2 and this is ARM on a Raspberry Pi Zero W, compiled with GCC. For example, when reloading an unsigned char with conversion to int, with optimization disabled, with GCC9.2 with no options. https://godbolt.org/z/7xnfqh. Older GCC all the way to the earliest on Godbolt (4.5) and presumably earlier print the same comment.
This is an RTL instruction name, included in the Standard Names list of the GCC internals manual under zero_extendmn2. Here m,n are the machine modes qi and si, which are respectively a byte and a 32-bit integer. So this is GCC's indication that it is generating an instruction which takes a byte (here loaded from memory) and zero-extends it into a 32-bit integer (here in the register r3). Which is exactly what the ARM ldrb instruction does.
I don't know what the 2 stands for, but it's apparently part of GCC's naming convention.
As Peter points out, it's a little odd that GCC would include such a comment in the assembly without -fverbose-asm. Indeed the comment is coded in as part of the template string in the machine description file, arm.md. It could have been a debugging aid that some GCC developer added and then forgot to take out.
(If you submit this for your assignment, please cite this post properly.)
I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.
The linker complains about relocation truncations, very much like those in this question.
I've already tried using -mcmodel=large, but as the answer to that same question says, I would
"need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.
I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:
ld: failed to convert GOTPCREL relocation; relink with --no-relax
and adding that flag also doesn't help.
I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.
I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.
Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?
Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).
call *__libc_start_main#GOTPCREL(%rip)
start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.
Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:
movabs $__libc_start_main#GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.
The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.
I'm currently having a weird issue when trying to run a C program that calls a very simple ARM assembly function. Here's my C code:
#include <stdio.h>
#include <stdlib.h>
extern void getNumber(int* pointer);
int main()
{
int* pointer = malloc(sizeof(int));
getNumber(pointer);
printf("%d\n", *pointer);
return 0;
}
And here's my assembly code:
.section .text
.align 4
.arm
.global getNumber
.type getNumber STT_FUNC
getNumber:
mov r1, #0
str r1, [r0]
bx lr
So far so good. However, if I add a line with mov r7, #0 at the top of getNumber, the program segfaults when trying to access pointer. After inspecting it with gdb I noticed now the pointer itself is stored at a very low address, such as 0xa.
Now, I did a bit of research and apparently r7 is the frame pointer for THUMB code (according to this). However, I'm clearly stating I don't want to use THUMB instructions in the .arm line in my assembly code. Why on earth is it failing?
I'm compiling both the .c and .s files using arm-linux-gnueabihf-gcc, and I'm running the program on a Cortex-A8 based board running Arch Linux.
Edit: The program runs fine if I compile using the -fomit-frame-pointer flag. However, I still want to know why is it using r7 as the frame pointer.
Edit 2: It's still failing even if I use .code 32 instead of .arm.
The ARM Procedure Call Standard specifies the following:
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6).
So your assembly language subroutine must save & restore r7 if it uses it.
You might be avoiding the problem with your small test program by by not compiling for Thumb mode, but you're just accidentally avoiding the problem. Anything that links to your assembly routine is entitled to expect that r7 will be preserved.
You're crashing the program because your are corrupting the frame pointer, like you mentioned. There is really no rhyme or reason to the convention. Just that ARM reserves certain registers for certain things. Kinda like in x86 esp is the stack pointer.
Here's a pretty good reference for registers to avoid:
http://msdn.microsoft.com/en-us/library/ms253599(v=vs.80).aspx
I finally got it: doing $ arm-linux-gnueabihf-gcc -v showed me the default options my compiler is using. Among those is: --with-mode=thumb.
Compiling with -marm fixed it. Now it's working as intended!
Edit: Upon reading the comments here I realize I was mistaken. I should've saved/restored r7 so it wouldn't screw up the rest of my program. Good thing I learned this now with a toy project and not while working on something real!
I have a small question about using ASM in c. I want to execute the instruction:
LDR PC,=0x123456
This gives me the error "unexpected token in operand".
asm("LDR PC,=0x123456");
This gives "invalid constraint".
asm("LDR PC," : "m" (0x123456));
What's the right way to do this?
You are using this:
asm("LDR PC,=0x123456");
This is not a standard ARM assembly instruction, but a pseudo-instruction provided as a compiler extension. This pseudo-instruction is converted to other assembly instructions when you compile it. It seems clang doesn't support this compiler extension (see this thread). You should do the conversion to assembly instructions yourself, see the ARM documentation for how the LDR pseudo-instruction is converted.
You can probably achieve the effect you want in plain C:
((void (*)(void))0x123456)();
or if you prefer more verbose:
typedef void FN(void);
((FN*)0x123456)();
I agree with #Étienne. I tried you code with mi Google toolchain. It's working fine.
I think you should read the manual how the compiler changes the directive to instructions (normally two mov instructions).
I'm building legacy code using the GNUARM C compiler and trying to resolve all the implicit declarations of functions.
I've come across some ARM specific functions and can't find the header file containing the declarations for these functions:
get_pc
get_cpsr
get_sp
I have searched the web and only came up with source code containing these functions without any non-standard include files.
I'll also settle for the function declarations.
Since I will also be porting the code to the Cygwin / Windows platform, what are the equivalent declarations for Cygwin GNU GCC?
Thanks.
Just write your own if you really need those functions, asm is easier than inline asm:
.globl get_pc
get_pc:
mov r0,pc
bx lr
.globl get_sp
get_sp:
mov r0,sp
bx lr
.globl get_cpsr
get_cpsr:
mrs r0,cpsr
bx lr
At least for arm. if you are porting to x86 and need the equivalents, I have to wonder what the code needs with those things anyway. the cpsr in particular you would likely have to change any code that uses the result as the status registers across processor vendors/families pretty much never match. The x86 equivalents should still be about the same level of effort, takes longer to do a google search and read the results than it is to just write the code (if you know the processor).
Depending on what your application is doing it is probably better to just comment out any code that calls those functions and/or uses the return value. I can imagine a few reasons why those items would be used, but it could get into architecture specific stuff and that is more involved than just porting a few register read functions. So what user786653 asked is the key question. How are these functions used? Not where can I find them but how are they used and why do you think you need them.
Are you sure those are functions? I'm not very familiar with ARM, but those sound like compiler intrinsics to me. If you're moving to GCC, you might be better off replacing those with inline assembly.