What is proper syntax to pass variables from C code to Assembly and back? - c

Struggling electrical engineering student trying to link C and Assembly (ARM32 Cortex-M) for an Embedded Systems final project. I don't fully understand the proper syntax for this project.
I was instructed to combine 2 previous labs - along with additional code - to build a simple calculator (+,-,*,/) with C and Assembly language in the MBED environment. I've set the C file to scan a keypad, take 3 user inputs to 3 strings, then pass these strings to an Assembly file. The Assembly file is to perform the arithmetic function and save the result in an EXPORT PROC. My C file then takes the result and printf to the user (which we read with PuTTY).
Here is my assembly header and import links:
AREA calculator, CODE, READONLY ; assembly header
compute_asm
IMPORT OPERAND_1 ; imports from C file
IMPORT OPERAND_2 ; imports from C file
IMPORT USER_OPERATION ; imports from C file
ALIGN ; aligns memory
initial_values PROC
LDR R1, =OPERAND_1; loads R1 with OPERAND_1
LDR R2, =OPERAND_2; loads R2 with OPERAND_2
Here are a few lines from my C file linking to Assembly:
int OPERAND_1; //declares OPERAND_1 for Assembly use
int OPERAND_2; //declares OPERAND_2 for Assembly use
int USER_OPERATION; //declares USER_OPERATION for Assembly use
extern int add_number(); //links add_number function in Assembly
extern int subtract_number(); //links subtract_number function in Assembly
I expected to be able to compile and use this code (the previous labs went much smoother than this project). But after working through some other syntax issues, I'm getting "Error: "/tmp/fOofpw", line 39: Warning: #47-D: incompatible redefinition of macro "MBED_RAM_SIZE" when I compile.
Coding is my weak spot. Any help or pointers would be appreciated!

In general the calling convention used by a specific version of a compiler for a specific target is specific to that compiler and version. And technically is subject to change at any time (even with gnu and arm we have seen that) and no reason to expect any other compiler conforms to the same convention. Despite that compilers like gcc and clang conform to some version of the arm recommended abi, which that abi has changed over time and gcc has changed along with it.
As Peter pointed out:
LDR R1, =OPERAND_1; loads R1 with OPERAND_1
(you are clearly not using gnu assembler, so not the gnu toolchain correct? probably Kiel or ARM?)
puts the address of that label into r1 to get the contents you need another load
ldr r1,[r1]
and now the contents are there.
Using global variables gets you around the calling convention problem.
Using a simple example and disassembling you can discover the calling convention for your compiler:
extern unsigned int add ( unsigned int, unsigned int);
unsigned int fun ( void )
{
return(add(3,4)+2);
}
00000000 <fun>:
0: b510 push {r4, lr}
2: 2104 movs r1, #4
4: 2003 movs r0, #3
6: f7ff fffe bl 0 <add>
a: 3002 adds r0, #2
c: bd10 pop {r4, pc}
e: 46c0 nop ; (mov r8, r8)
first parameter in r0, second in r1, return in r0. which could technically change on any version of gnu going forward but can tell you from gcc 2.x.x to the present 9.1.0 this is how it has been for arm. gcc 3.x.x to the present for thumb which is what you are using.
How you have done it is fine, you just need to recognize what the =LABEL shortcut thing really does.

Related

Clang-15 vs Clang-14 local static init guards

I'm using Clang++ to compile for a Cortex-M0+ target, and in moving from version 14 to version 15 I've found a difference in the code generated for guard variables for local statics.
So, for example:
int main()
{
static knl::QueueN<uint32_t, 8> valQueue;
...
}
Clang-14 generates the following:
ldr r0, .LCPI0_4
ldrb r0, [r0]
dmb sy
lsls r0, r0, #31
beq .LBB0_8
Clang-15 now generates:
ldr r0, .LCPI0_4
movs r1, #2
bl __atomic_load_1
lsls r0, r0, #31
beq .LBB0_8
Why the change? Was the Clang 14 code incorrect?
EDITED TO ADD:
Note that an important consequence of this is that the second case actually requires an implementation of __atomic_load_1 to be provided from somewhere external to the compiler (e.g. -latomic), whereas the first doesn't.
EDITED TO ADD:
See https://github.com/llvm/llvm-project/issues/58184 for the LLVM devs' response to this.
Neither one is wrong. It's just that in the first version, the code to do the atomic load is inlined, and in the second version it's called as a library function instead. If you look at the code within __atomic_load_1 you will probably find it executes the exact same instructions, or equivalent ones.
Each way has pros and cons. The inline version avoids the overhead of a function call, while the library version makes it possible to select code at runtime that is best optimized for the features of the actual runtime CPU.
The difference could be a conscious design change between clang versions, or a difference in the code gen and optimization options you used, or different configuration options when your clang installation was built. Someone else might know more details about what controls this. But it isn't anything to worry about as far as proper behavior of your code.

What is __0sprintf in Keil's C library? How to use it to save code size vs. full sprintf?

I am trying to use sprintf standard C function in ARM assembler code, in Keil uVision, for STM32.
Looking in example C project disassembly, I can see this:
54: sprintf( test_string, format_string, num);
0x080028C2 F1000110 ADD r1,r0,#0x10
0x080028C6 E9D02302 LDRD r2,r3,[r0,#0x08]
0x080028CA 4810 LDR r0,[pc,#64] ; #0x0800290C
0x080028CC F7FFFB66 BL.W __0sprintf (0x08001F9C)
C code disassembly calls function __0sprintf.
In my assembly program, I write such code:
LDR r0, =Data_string
LDR r1, =Format_str
;r2 & r3 loaded above
BL sprintf
And it works well, but takes up to 11 kB.
If I call __0sprintf function in my code:
LDR r0, =Data_string
LDR r1, =Format_str
;r2 & r3 loaded above
BL __0sprintf
it takes up much less, about 5 kB - but do not work.
I can not find in Google any information about __0sprintf function or something about it. Where can I read about these functions? May be I can understand why it is not working?
I can not find in Google any information about __0sprintf function or something about it.
Names beginning with two underscores are reserved for the implementation.
That means you're not allowed to use them in your code, but it also means that if you see one, it's probably an internal implementation detail.
Where can I read about these functions?
If it's an implementation detail, all you can do is check the documentation for that implementation, ask the vendor or examine the source if it's available, or search to see if someone else has already investigated or decompiled this function.
May be I can understand why it is not working?
You can always try disassembling that function from your vendor's runtime library (assuming that doesn't breach some license), or try instruction-stepping into it if the platform has an interactive debugger.
Otherwise, your best bet is to figure out how to write some regular C code that calls the same implementation, and compare the call sites.

Illegal instruction when running simple ELLCC-generated ELF binary on a Raspberry Pi

I have an empty program in LLVM IR:
define i32 #main(i32 %argc, i8** %argv) nounwind {
entry:
ret i32 0
}
I'm cross-compiling it on Intel x86-64 Windows for ARM Linux using ELLCC, with the following command:
ecc++ hw.ll -o hw.o -target arm-linux-engeabihf
It completes without errors and generates an ELF binary.
When I take the binary to a Raspberry Pi Model B+ (running Raspbian), I get only the following error:
Illegal instruction
I don't know how to tell what's wrong from the disassembled code. I tried other ARM Linux targets but the behavior was the same. What's wrong?
The exact same file builds, links and runs fine for other targets like i386-linux-eng, x86_64-w64-mingw32, etc (that I could test on), again using the ELLCC toolchain.
Assuming the library and startup code isn't at fault, this is what the disassembly of main itself looks like:
.text:00010188 e24dd008 sub sp, sp, #8
.text:0001018c e3002000 movw r2, #0
.text:00010190 e58d0004 str r0, [sp, #4]
.text:00010194 e1a00002 mov r0, r2
.text:00010198 e58d1000 str r1, [sp]
.text:0001019c e28dd008 add sp, sp, #8
.text:000101a0 e12fff1e bx lr
I'd guess it's choking on the movw at 0x0001018c. The movw/movt encodings which can handle full 16-bit immediate values first appeared in the ARMv6T2 version of the architecture - the ARM1176 in the original Pi models predates that, only supporting original ARMv6*.
You need to tell the compiler to generate code appropriate to the thing you're running on - I don't know ELLCC, but I'd guess from this it's fairly modern and up-to-date and thus defaulting to something newer like ARMv6T2 or ARMv7. Otherwise, it's akin to generating code for a Pentium and hoping it works on an 80486 - you might be lucky, you might not. That said, there's no good reason it should have chosen that encoding in the first place - it's not as if 0 can't be encoded in a 'classic' mov instruction...
The decadent option, however, would be to consider this a perfect excuse to replace the Pi with a Pi 2 - the Cortex-A7s in that are nice capable ARMv7 cores ;)
* Lies for clarity. I think 1176 might actually be v6K, but that's irrelevant here. I'm not sure if anything actually exists as plain ARMv6, and all the various architecture extensions are frankly a hideous mess

ARM Assembly - Why does my app crash when zeroing r7?

I'm currently having a weird issue when trying to run a C program that calls a very simple ARM assembly function. Here's my C code:
#include <stdio.h>
#include <stdlib.h>
extern void getNumber(int* pointer);
int main()
{
int* pointer = malloc(sizeof(int));
getNumber(pointer);
printf("%d\n", *pointer);
return 0;
}
And here's my assembly code:
.section .text
.align 4
.arm
.global getNumber
.type getNumber STT_FUNC
getNumber:
mov r1, #0
str r1, [r0]
bx lr
So far so good. However, if I add a line with mov r7, #0 at the top of getNumber, the program segfaults when trying to access pointer. After inspecting it with gdb I noticed now the pointer itself is stored at a very low address, such as 0xa.
Now, I did a bit of research and apparently r7 is the frame pointer for THUMB code (according to this). However, I'm clearly stating I don't want to use THUMB instructions in the .arm line in my assembly code. Why on earth is it failing?
I'm compiling both the .c and .s files using arm-linux-gnueabihf-gcc, and I'm running the program on a Cortex-A8 based board running Arch Linux.
Edit: The program runs fine if I compile using the -fomit-frame-pointer flag. However, I still want to know why is it using r7 as the frame pointer.
Edit 2: It's still failing even if I use .code 32 instead of .arm.
The ARM Procedure Call Standard specifies the following:
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6).
So your assembly language subroutine must save & restore r7 if it uses it.
You might be avoiding the problem with your small test program by by not compiling for Thumb mode, but you're just accidentally avoiding the problem. Anything that links to your assembly routine is entitled to expect that r7 will be preserved.
You're crashing the program because your are corrupting the frame pointer, like you mentioned. There is really no rhyme or reason to the convention. Just that ARM reserves certain registers for certain things. Kinda like in x86 esp is the stack pointer.
Here's a pretty good reference for registers to avoid:
http://msdn.microsoft.com/en-us/library/ms253599(v=vs.80).aspx
I finally got it: doing $ arm-linux-gnueabihf-gcc -v showed me the default options my compiler is using. Among those is: --with-mode=thumb.
Compiling with -marm fixed it. Now it's working as intended!
Edit: Upon reading the comments here I realize I was mistaken. I should've saved/restored r7 so it wouldn't screw up the rest of my program. Good thing I learned this now with a toy project and not while working on something real!

Wrong result with log10 math function in armv6 on Raspberry Pi

I have this very simple code:
#include <stdio.h>
#include <math.h>
int main()
{
long v = 35;
double app = (double)v;
app /= 100;
app = log10(app);
printf("Calculated log10 %lf\n", app);
return 0;
}
This code works perfectly on x86, but doesn't work on arm, on which the result is 0.00000. Some ideas?
Other info:
Operating system: linux 3.2.27
I build arm toolchain with ct-ng: arm-unknown-linux-gnueabi-
libc version 2.13
Output of gcc -v:
Using built-in specs.
COLLECT_GCC=arm-unknown-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/opt/x-tools/arm-unknown-linux-gnueabi/libexec/gcc/arm-unknown-linux-gnueabi/4.5.1/lto-wrapper
Target: arm-unknown-linux-gnueabi
Configured with: /home/mirko/misc/rasppi-ct-ng-files/.build/src/gcc-4.5.1/configure --build=x86_64-build_unknown-linux-gnu --host=x86_64-build_unknown-linux-gnu --target=arm-unknown-linux-gnueabi --prefix=/opt/x-tools/arm-unknown-linux-gnueabi --with-sysroot=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --enable-languages=c --disable-multilib --with-pkgversion=crosstool-NG-1.9.3 --enable-__cxa_atexit --disable-libmudflap --disable-libgomp --disable-libssp --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-gmp=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpfr=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-mpc=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-ppl=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-cloog=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --with-libelf=/home/mirko/misc/rasppi-ct-ng-files/.build/arm-unknown-linux-gnueabi/build/static --enable-threads=posix --enable-target-optspace --with-local-prefix=/opt/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi//sys-root --disable-nls --enable-symvers=gnu --enable-c99 --enable-long-long
Thread model: posix
gcc version 4.5.1 (crosstool-NG-1.9.3)
Floating point support on ARM Linux distributions is not trivial. Because of that you should use a toolchain matching your system that is operating system & hardware and use the right compile switches.
First thing you need to understand ARM's calling convention which is about "how arguments are passed when you call a function?". ARM being a RISC architecture, can only work on registers. There are no instructions manipulating memory directly. If you need to change a value in memory you first need to load it to a register, modify it, then you need to store it back on the memory.
When you call a function you may need to pass arguments to it, you can put arguments on stack (memory) but since ARM can only work with registers first thing your function would probably do will be loading them back to registers. To avoid this waste ARM calling convention uses registers to pass arguments. However since ARM has a limited number of registers, calling convention also dictates you to use only first four (r0-r3) registers for the first four arguments, remaining must still use stack to be passed.
Second thing is early ARM cores didn't have any floating point support, operations where implemented in software. (This is what is still supported via gcc's -mfloat-abi=soft.)
We can easily demonstrate what this means via following snippet.
float pi2(float a) {
return a * 3.14f;
}
Compiling this via -c -O3 -mfloat-abi=soft and obdumping gives us
00000000 <pi2>:
0: f24f 51c3 movw r1, #62915 ; 0xf5c3
4: b508 push {r3, lr}
6: f2c4 0148 movt r1, #16456 ; 0x4048
a: f7ff fffe bl 0 <__aeabi_fmul>
e: bd08 pop {r3, pc}
As you can see (actually it is not visible :) ) pi2 gets its parameter in r0, populates pi constant on r1 and uses __aeabi_fmul to multiply those and return result in r0. Since __aeabi_fmul also uses same calling convention, details about r0 is not visible. All our function does to populate r1 and delegate it to __aeabi_fmul.
When floating hardware support added to ARM (again because of architecture style), it came with its own set of registers (s0, s1, ...).
If we compile same snippet with -c -O3 -mfloat-abi=softfp and dump we get
00000000 <pi2>:
0: eddf 7a04 vldr s15, [pc, #16] ; 14 <pi2+0x14>
4: ee07 0a10 vmov s14, r0
8: ee27 7a27 vmul.f32 s14, s14, s15
c: ee17 0a10 vmov r0, s14
10: 4770 bx lr
12: bf00 nop
14: 4048f5c3 .word 0x4048f5c3
As you can see now compiler doesn't create a call to __aeabi_fmul but instead it creates a vmul.f32 instruction after it moves argument located in r0 to s14 and populates 3.14 on s15. After multiplication instruction it moves result available in s14 back to r0 since any caller of this function would expect it because of the calling convention.
Now if you think pi2 as a library provided to you by some third party, you can understand that both soft and softfp implementations do the same thing for you and you can use them interchangeably. If system provides them for you, you wouldn't care if your app runs on a system with hardware floating point support or not. This was quite good to keep old software running on new hardware.
However while keeping compability this approach introduces the overhead of moving values between ARM registers and FP registers. This obviously effects performance and addressed by a new calling convention, called hard by gcc. This new convention states that if you have floating point arguments in your function you can utilize floating point registers interleaved with normal ones, as well as you can return floating point values in floating point register s0.
Again if we compile our snippet with -c -O3 -mfloat-abi=hard and dump we get
00000000 <pi2>:
0: eddf 7a02 vldr s15, [pc, #8] ; c <pi2+0xc>
4: ee20 0a27 vmul.f32 s0, s0, s15
8: 4770 bx lr
a: bf00 nop
c: 4048f5c3 .word 0x4048f5c3
You can see there is no registers getting moved around. Argument to pi2 gets passed in s0, compiler created code to populate 3.14 in s15 and uses vmul.f32 s0, s0, s15 to get result we want in s0.
Big problem with this new convention is while you improve the code produced by compiler you completely kill compability. You can't expect an application built with hard convention to work with libraries built for soft/softfp and an application built for softfp won't work with libraries built for hard.
For more information on calling conventions you should check ARM's website.

Resources