I'm using Clang++ to compile for a Cortex-M0+ target, and in moving from version 14 to version 15 I've found a difference in the code generated for guard variables for local statics.
So, for example:
int main()
{
static knl::QueueN<uint32_t, 8> valQueue;
...
}
Clang-14 generates the following:
ldr r0, .LCPI0_4
ldrb r0, [r0]
dmb sy
lsls r0, r0, #31
beq .LBB0_8
Clang-15 now generates:
ldr r0, .LCPI0_4
movs r1, #2
bl __atomic_load_1
lsls r0, r0, #31
beq .LBB0_8
Why the change? Was the Clang 14 code incorrect?
EDITED TO ADD:
Note that an important consequence of this is that the second case actually requires an implementation of __atomic_load_1 to be provided from somewhere external to the compiler (e.g. -latomic), whereas the first doesn't.
EDITED TO ADD:
See https://github.com/llvm/llvm-project/issues/58184 for the LLVM devs' response to this.
Neither one is wrong. It's just that in the first version, the code to do the atomic load is inlined, and in the second version it's called as a library function instead. If you look at the code within __atomic_load_1 you will probably find it executes the exact same instructions, or equivalent ones.
Each way has pros and cons. The inline version avoids the overhead of a function call, while the library version makes it possible to select code at runtime that is best optimized for the features of the actual runtime CPU.
The difference could be a conscious design change between clang versions, or a difference in the code gen and optimization options you used, or different configuration options when your clang installation was built. Someone else might know more details about what controls this. But it isn't anything to worry about as far as proper behavior of your code.
Related
I build as little OS for a CortexM4 CPU which is able to receive compiled binaries over UART and schedule them dynamically. I want to use that feature to craft a testsuite which uploads test programs being able to directly call OS functions like memory allocation without doing a SVC. Therefor I need to cast the fixed addresses of those OS routines to function pointers. Now, casting of memory addresses resulting in wrong / non-thumb instruction code - BL is needed instead of BLX, resulting in HardFaults.
void (*functionPtr_addr)(void);
functionPtr_addr = (void (*)()) (0x0800084C);
This is the assembly when calling this function
8000838: 4b03 ldr r3, [pc, #12] ; (8000848 <idle+0x14>)
800083a: 681b ldr r3, [r3, #0]
800083c: 4798 blx r3
Is there a way to force the BL instruction for such a case? It works with inline assembly, I could write macros but it would be much cleaner do it this way.
The code gets compiled and linked, among other things, with
-mcpu=cortex-m4 -mthumb.
Toolchain:
gcc version 12.2.0 (Arm GNU Toolchain 12.2.MPACBTI-Bet1 (Build arm-12-mpacbti.16))
bl instruction is limited in range. The compiler does not know where your code will be placed so it can't know if the instruction bl can be used.
resulting in HardFaults.
The address passed to blx has to be odd on Cortex-M4 uCs to execute the code in the Thumb mode. Your address is even and the uC tries to execute ARM code not supported by this core.
I am trying to use sprintf standard C function in ARM assembler code, in Keil uVision, for STM32.
Looking in example C project disassembly, I can see this:
54: sprintf( test_string, format_string, num);
0x080028C2 F1000110 ADD r1,r0,#0x10
0x080028C6 E9D02302 LDRD r2,r3,[r0,#0x08]
0x080028CA 4810 LDR r0,[pc,#64] ; #0x0800290C
0x080028CC F7FFFB66 BL.W __0sprintf (0x08001F9C)
C code disassembly calls function __0sprintf.
In my assembly program, I write such code:
LDR r0, =Data_string
LDR r1, =Format_str
;r2 & r3 loaded above
BL sprintf
And it works well, but takes up to 11 kB.
If I call __0sprintf function in my code:
LDR r0, =Data_string
LDR r1, =Format_str
;r2 & r3 loaded above
BL __0sprintf
it takes up much less, about 5 kB - but do not work.
I can not find in Google any information about __0sprintf function or something about it. Where can I read about these functions? May be I can understand why it is not working?
I can not find in Google any information about __0sprintf function or something about it.
Names beginning with two underscores are reserved for the implementation.
That means you're not allowed to use them in your code, but it also means that if you see one, it's probably an internal implementation detail.
Where can I read about these functions?
If it's an implementation detail, all you can do is check the documentation for that implementation, ask the vendor or examine the source if it's available, or search to see if someone else has already investigated or decompiled this function.
May be I can understand why it is not working?
You can always try disassembling that function from your vendor's runtime library (assuming that doesn't breach some license), or try instruction-stepping into it if the platform has an interactive debugger.
Otherwise, your best bet is to figure out how to write some regular C code that calls the same implementation, and compare the call sites.
Struggling electrical engineering student trying to link C and Assembly (ARM32 Cortex-M) for an Embedded Systems final project. I don't fully understand the proper syntax for this project.
I was instructed to combine 2 previous labs - along with additional code - to build a simple calculator (+,-,*,/) with C and Assembly language in the MBED environment. I've set the C file to scan a keypad, take 3 user inputs to 3 strings, then pass these strings to an Assembly file. The Assembly file is to perform the arithmetic function and save the result in an EXPORT PROC. My C file then takes the result and printf to the user (which we read with PuTTY).
Here is my assembly header and import links:
AREA calculator, CODE, READONLY ; assembly header
compute_asm
IMPORT OPERAND_1 ; imports from C file
IMPORT OPERAND_2 ; imports from C file
IMPORT USER_OPERATION ; imports from C file
ALIGN ; aligns memory
initial_values PROC
LDR R1, =OPERAND_1; loads R1 with OPERAND_1
LDR R2, =OPERAND_2; loads R2 with OPERAND_2
Here are a few lines from my C file linking to Assembly:
int OPERAND_1; //declares OPERAND_1 for Assembly use
int OPERAND_2; //declares OPERAND_2 for Assembly use
int USER_OPERATION; //declares USER_OPERATION for Assembly use
extern int add_number(); //links add_number function in Assembly
extern int subtract_number(); //links subtract_number function in Assembly
I expected to be able to compile and use this code (the previous labs went much smoother than this project). But after working through some other syntax issues, I'm getting "Error: "/tmp/fOofpw", line 39: Warning: #47-D: incompatible redefinition of macro "MBED_RAM_SIZE" when I compile.
Coding is my weak spot. Any help or pointers would be appreciated!
In general the calling convention used by a specific version of a compiler for a specific target is specific to that compiler and version. And technically is subject to change at any time (even with gnu and arm we have seen that) and no reason to expect any other compiler conforms to the same convention. Despite that compilers like gcc and clang conform to some version of the arm recommended abi, which that abi has changed over time and gcc has changed along with it.
As Peter pointed out:
LDR R1, =OPERAND_1; loads R1 with OPERAND_1
(you are clearly not using gnu assembler, so not the gnu toolchain correct? probably Kiel or ARM?)
puts the address of that label into r1 to get the contents you need another load
ldr r1,[r1]
and now the contents are there.
Using global variables gets you around the calling convention problem.
Using a simple example and disassembling you can discover the calling convention for your compiler:
extern unsigned int add ( unsigned int, unsigned int);
unsigned int fun ( void )
{
return(add(3,4)+2);
}
00000000 <fun>:
0: b510 push {r4, lr}
2: 2104 movs r1, #4
4: 2003 movs r0, #3
6: f7ff fffe bl 0 <add>
a: 3002 adds r0, #2
c: bd10 pop {r4, pc}
e: 46c0 nop ; (mov r8, r8)
first parameter in r0, second in r1, return in r0. which could technically change on any version of gnu going forward but can tell you from gcc 2.x.x to the present 9.1.0 this is how it has been for arm. gcc 3.x.x to the present for thumb which is what you are using.
How you have done it is fine, you just need to recognize what the =LABEL shortcut thing really does.
I have an empty program in LLVM IR:
define i32 #main(i32 %argc, i8** %argv) nounwind {
entry:
ret i32 0
}
I'm cross-compiling it on Intel x86-64 Windows for ARM Linux using ELLCC, with the following command:
ecc++ hw.ll -o hw.o -target arm-linux-engeabihf
It completes without errors and generates an ELF binary.
When I take the binary to a Raspberry Pi Model B+ (running Raspbian), I get only the following error:
Illegal instruction
I don't know how to tell what's wrong from the disassembled code. I tried other ARM Linux targets but the behavior was the same. What's wrong?
The exact same file builds, links and runs fine for other targets like i386-linux-eng, x86_64-w64-mingw32, etc (that I could test on), again using the ELLCC toolchain.
Assuming the library and startup code isn't at fault, this is what the disassembly of main itself looks like:
.text:00010188 e24dd008 sub sp, sp, #8
.text:0001018c e3002000 movw r2, #0
.text:00010190 e58d0004 str r0, [sp, #4]
.text:00010194 e1a00002 mov r0, r2
.text:00010198 e58d1000 str r1, [sp]
.text:0001019c e28dd008 add sp, sp, #8
.text:000101a0 e12fff1e bx lr
I'd guess it's choking on the movw at 0x0001018c. The movw/movt encodings which can handle full 16-bit immediate values first appeared in the ARMv6T2 version of the architecture - the ARM1176 in the original Pi models predates that, only supporting original ARMv6*.
You need to tell the compiler to generate code appropriate to the thing you're running on - I don't know ELLCC, but I'd guess from this it's fairly modern and up-to-date and thus defaulting to something newer like ARMv6T2 or ARMv7. Otherwise, it's akin to generating code for a Pentium and hoping it works on an 80486 - you might be lucky, you might not. That said, there's no good reason it should have chosen that encoding in the first place - it's not as if 0 can't be encoded in a 'classic' mov instruction...
The decadent option, however, would be to consider this a perfect excuse to replace the Pi with a Pi 2 - the Cortex-A7s in that are nice capable ARMv7 cores ;)
* Lies for clarity. I think 1176 might actually be v6K, but that's irrelevant here. I'm not sure if anything actually exists as plain ARMv6, and all the various architecture extensions are frankly a hideous mess
Can we use Address of operator "&" inline GCC ARM assembly? If yes then I have a structure core_regx and I need to pass the address of a member r0 of that strucutre into the below mentioned code:
asm volatile("ldr r3, [%0,#0]":: "r" (&(core_reg->r0)));
Please check if this code is correct or not.
Yes, you certainly can use &. However, I would suggest that your assembler specifiers may have some issues and better options.
asm volatile("ldr r3, %0":: "m" (core_reg->r0) : "r3");
You definitely should add r3 to the clobber list. Also, the "m" specifier is probably better. If core_reg is already in r0, the compiler can use the offset of r0 member and generate code such as,
add r0, r0, #12 ; assuming r0 is core_reg.
ldr r3, [r0]
The compiler knows the relation between core_reg and core_reg->r0. At least "m" works well with some versions of arm-xxx-gcc. Run objdump --disassemble on the code the compiler generates to verify it is doing what you want.
Edit: The GCC manual has lots of information, such as Gcc assembler contraints, Machine specific and General Info. There are many tutorials on the Internet such as the ARM assembler cookbook, which is one of the best.