ARM conditional instruction setting flags - arm

As I read about conditional execution in ARM, I can see why instructions in ARM do not set the flags by default. They seem to set flags only when the S suffix is added in. I was wondering if it would be possible to have an ARM instruction that executes conditionally, but also sets flags. Could you give me an example for the same?

Yes it is possible to both execute conditionally and set the flags, for example
ADDSCS r0, r1, r2 ; If C flag set then r0 = r1 + r2, and update flags

Every ARM instruction executes conditionally (unless the condition code is AL for always). Any condition code is allowed on any instruction.

Related

ARM GCC + Cortex M4: Calling address as function generates BLX instead of BL

I build as little OS for a CortexM4 CPU which is able to receive compiled binaries over UART and schedule them dynamically. I want to use that feature to craft a testsuite which uploads test programs being able to directly call OS functions like memory allocation without doing a SVC. Therefor I need to cast the fixed addresses of those OS routines to function pointers. Now, casting of memory addresses resulting in wrong / non-thumb instruction code - BL is needed instead of BLX, resulting in HardFaults.
void (*functionPtr_addr)(void);
functionPtr_addr = (void (*)()) (0x0800084C);
This is the assembly when calling this function
8000838: 4b03 ldr r3, [pc, #12] ; (8000848 <idle+0x14>)
800083a: 681b ldr r3, [r3, #0]
800083c: 4798 blx r3
Is there a way to force the BL instruction for such a case? It works with inline assembly, I could write macros but it would be much cleaner do it this way.
The code gets compiled and linked, among other things, with
-mcpu=cortex-m4 -mthumb.
Toolchain:
gcc version 12.2.0 (Arm GNU Toolchain 12.2.MPACBTI-Bet1 (Build arm-12-mpacbti.16))
bl instruction is limited in range. The compiler does not know where your code will be placed so it can't know if the instruction bl can be used.
resulting in HardFaults.
The address passed to blx has to be odd on Cortex-M4 uCs to execute the code in the Thumb mode. Your address is even and the uC tries to execute ARM code not supported by this core.

GCC startup code _start does not end in main()

I could only find bits and pieces of information on the symbol _start, which is called from the target startup code in order to establish the C runtime environment. This would be necessary to ensure that all initialized global/static variables are properly loaded prior to branching to main().
In my case, I am using an MCU with an ARM Cortex-R4F core CPU. When the device resets, I implement all of the steps recommended by the MCU manufacturer then attempt to branch to the symbol _start using the following lines of code:
extern void _start(void);
_start();
I am using something similar to the following to link the program:
armeb-eabi-gcc-7.5.0" -marm -fno-exceptions -Og -ffunction-sections -fdata-sections -g -gdwarf-3 -gstrict-dwarf -Wall -mbig-endian -mcpu=cortex-r4 -Wl,-Map,"app_tms570_dev.map" --entry main -static -Wl,--gc-sections -Wl,--build-id=none -specs="nosys.specs" -o[OUTPUT FILE NAME HERE] [ALL OBJECT FILES HERE] -Wl,-T[LINKER COMMAND FILE NAME HERE]
My toolchain in this case is gcc-linaro-7.5.0-2019.12-i686-mingw32_armeb-eabi, which is being used since my MCU device is big-endian.
As I trace through the call to symbol _start, I can see my program branch to symbol _start then a few unexpected things happen.
First, there are a couple of places where the following instruction is called:
EF123456 svc #0x123456
This basically generates a software interrupt, which causes the program to branch to the software interrupt handler that I have configured for the device.
Secondly, the device eventually branches to __libc_init_array then _init. However, symbol _init does not contain any branch instruction and allows the program to flow into _fini, which also does not contain any branch instruction and allows the program to flow into whatever code was placed next in memory. This eventually causes some type of abort exception, as would be expected.
The disassembly associated with _init and _fini:
_init():
00003b00: E1A0C00D mov r12, r13
00003b04: E92DDFF8 push {r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r14, pc}
00003b08: E24CB004 sub r11, r12, #4
_fini():
00003b0c: E1A0C00D mov r12, r13
00003b10: E92DDFF8 push {r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r14, pc}
00003b14: E24CB004 sub r11, r12, #4
Based on some other documentation I read, I also attempted to call main() directly, but this just caused the program to jump to main() without initializing anything. I also tried to call symbol __main() similar to what is done when using the ARM Compiler in order to execute startup code, but this symbol is not found.
Note that this is for a bare-metal-ish system that does not use semihosting.
My question is: Is there a way to set up the system and call a function that will establish the C runtime environment automatically and branch to main() using the GCC linker?
For the time being, I have implemented my own function to initialize .data sections and the .bss sections are already being zeroed at reset using a built in feature of the MCU device.
Adding some more details here:
The specific MCU that I am using should not be relevant, particularly taking the following discussion into consideration.
First, I have already set up the exception vectors for the device in an assembler file:
.section .excvecs,"ax",%progbits
.type Exc_Vects, %object
.size Exc_Vects, .-Exc_Vects
// See DDI0363G, Table 3-6
Exc_Vects:
b c_int00 // Reset vector
b exc_undef // Undefined instruction
b exc_software // Software
b exc_prefetch // Pre-fetch abort
b exc_data // Data abort
b exc_invalid // Invalid vector
There are two instructions that follow for the IRQ and FIQ interrupts as well, but they are set according to the MCU datasheet. I have defined handlers for the undefined instruction, prefetch abort, data abort and invalid vector exceptions. For the software exception I use some assembly to jump to an address that can be changed at runtime. My startup sequence begins at c_int00. These have all been tested and work with no problems.
My reset handler takes care of all of the steps needed for initializing the MCU in accordance with the MCU datasheet. This include initializing CPU registers and the stack pointers, which are loaded using symbols from the linker file.
The toolchain that I am using, noted above, includes the C standard libraries and other libraries needed to compile and link my program with no problems. This includes the symbol _start that I mentioned previously.
From what I understand, the function _start typically wraps main(). Before it calls main() it initializes .bss and .data sections, configures the heap, as well as performing some other tasks to set up the environment. When main() returns, it performs some clean up tasks and branches to a designated exit() function. (Side note: _start is defined in newlib based on the source code that I downloaded from linaro).
There is some detail regarding this in a separate response here:
What is the use of _start() in C?
I have been using the ARM Compiler as an alternative for the same project. There, __main performs these functions. For the stack initialization, I basically provide it an empty hook function and for exit I provide it with a function that safely terminates the program should main() return for some reason. I am not sure if something like this is needed for GCC.
I would note that I have included option -specs="nosys.specs" without option -nostartfiles. My understanding is that this avoids implementing some of the functions that do not want to use in my application, such as I/O operations, but links the startup code.
I am not using the heap in my project as dynamic memory use is frowned upon, but I was hoping to be able to use the startup code primarily in order to avoid having to remember to initialize .data sections manually. Above I noted that my application is baremetal-ish. I am actually using an RTOS and have the memory partitioned into blocks so that I can use the device MPU.

Meaning of # zero_extendqisi2

I was wondering what the actual meaning of # zero_extendqisi2 in gcc assembly output was and also the usage. I couldn't find what qisi stands for or anything along those lines.
For context, the line is ldrb r3, [fp, #-9] # zero_extendqisi2 and this is ARM on a Raspberry Pi Zero W, compiled with GCC. For example, when reloading an unsigned char with conversion to int, with optimization disabled, with GCC9.2 with no options. https://godbolt.org/z/7xnfqh. Older GCC all the way to the earliest on Godbolt (4.5) and presumably earlier print the same comment.
This is an RTL instruction name, included in the Standard Names list of the GCC internals manual under zero_extendmn2. Here m,n are the machine modes qi and si, which are respectively a byte and a 32-bit integer. So this is GCC's indication that it is generating an instruction which takes a byte (here loaded from memory) and zero-extends it into a 32-bit integer (here in the register r3). Which is exactly what the ARM ldrb instruction does.
I don't know what the 2 stands for, but it's apparently part of GCC's naming convention.
As Peter points out, it's a little odd that GCC would include such a comment in the assembly without -fverbose-asm. Indeed the comment is coded in as part of the template string in the machine description file, arm.md. It could have been a debugging aid that some GCC developer added and then forgot to take out.
(If you submit this for your assignment, please cite this post properly.)

What happens in the assembly output when we add "cc" to clobber list

I read that if we specify "cc" in clobber list it indicates that an assembler code modifies flags register
Wrote a sample program to check the difference in between adding "cc" and not adding.
Comparing the assembly there is no change when we add "cc".
#include <stdio.h>
int main(void)
{
unsigned long sum;
asm("incq %0"
: "=r"(sum)//output operand
: "r" (sum) //input operand
);
printf("sum= %lu\n", sum);
return 0;
}
When should we use "cc", and what is the effect of it in the assembly output
For x86, absolutely nothing. For x86 and x86-64, a cc clobber is implicit in every asm() statement. This design decision makes some sense because most x86 instructions wrote FLAGS. And because it's easy to miss and could be hard to catch with testing. (Although there's no shortage of things that are easy to get wrong with GNU C inline asm. There's usually no need to use it.)
(It does make it impossible to tell the compiler when your asm statement doesn't modify flags, but the cost of that is probably low, usually just one more instruction to redo a compare or something, or to save a variable so it can be compared later.)
If you want to be pedantic, you can use a "cc" clobber in every asm statement that modifies FLAGS.
For non-x86, you must use a cc clobber in every asm statement that modifies flags / condition codes (on ISAs that have them). e.g. ARM. On ARM, setting flags is optional; instructions with an S suffix set flags. So adds r0, r1, r2 sets flags according to r0 = r1+r2, but add r0, r1, r2 leaves flags untouched.
If you left out a "cc" clobber (on non-x86), the compiler might emit asm that set flags before an asm statement and read them afterwards, as part of implementing some other non-asm statement. So it could be essentially the same as destroying a register: nonsensical behaviour that depends on the details of what the compiler was using the register or the flags for, and which varies with optimization level and/or compiler version.
This is why testing isn't sufficient to prove inline asm is safe. With one compiler version, you could easily get lucky and have the compiler generate code that happened not to keep anything in the status register / condition codes across an asm statement, but a different compiler version or different surrounding code in a function where this inlines could be vulnerable to a buggy asm statement.

Can we use Address of operator "&" inline GCC ARM assembly?

Can we use Address of operator "&" inline GCC ARM assembly? If yes then I have a structure core_regx and I need to pass the address of a member r0 of that strucutre into the below mentioned code:
asm volatile("ldr r3, [%0,#0]":: "r" (&(core_reg->r0)));
Please check if this code is correct or not.
Yes, you certainly can use &. However, I would suggest that your assembler specifiers may have some issues and better options.
asm volatile("ldr r3, %0":: "m" (core_reg->r0) : "r3");
You definitely should add r3 to the clobber list. Also, the "m" specifier is probably better. If core_reg is already in r0, the compiler can use the offset of r0 member and generate code such as,
add r0, r0, #12 ; assuming r0 is core_reg.
ldr r3, [r0]
The compiler knows the relation between core_reg and core_reg->r0. At least "m" works well with some versions of arm-xxx-gcc. Run objdump --disassemble on the code the compiler generates to verify it is doing what you want.
Edit: The GCC manual has lots of information, such as Gcc assembler contraints, Machine specific and General Info. There are many tutorials on the Internet such as the ARM assembler cookbook, which is one of the best.

Resources