qemu-arm with Cortex-M4 on Linux

qemu-arm with Cortex-M4 on Linux - c

I am using qemu-arm and the ARM Workbench IDE to run/profile an ARM binary which was built with armcc/armlink (an .axf-File, program written in C). This works fine with Cortex-A9 and ARM926/ARM5TE. However, whatever I tried, it doesnt work when the binary is built for Cortex-M4. Both the simulator and qemu-arm hang when M4 is selected as CPU.
I know that this processor requires some additional startup code, but I could find any comprehensive tutorial on how to get it running. Does anyone know how to do this? I have a quite big project with one main function, but it would already help if a "hello world" or some simple program which takes arguments would run.
Here is the command line I am using with Cortex-A9:
qemu-system-arm -machine versatileab -cpu cortex-a9 -nographic -monitor null -semihosting -append 'some program arguments' -kernel program.axf

I do not know how to do it with the versatilepb, it did not "just work", but this does work:
flash.s
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
notmain.c
void PUT32 ( unsigned int, unsigned int );
#define UART0BASE 0x4000C000
int notmain ( void )
{
unsigned int rx;
for(rx=0;rx<8;rx++)
{
PUT32(UART0BASE+0x00,0x30+(rx&7));
}
return(0);
}
flash.ld
ENTRY(_start)
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
(I am told the entry point being a thumb function address is critical YMMV)
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m3 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -ffreestanding -mcpu=cortex-m3 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -nostdlib -nostartfiles -T flash.ld flash.o notmain.o -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy -O binary notmain.elf notmain.bin
check the vector table, etc.
00000000 <_start>:
0: 20001000
4: 0000000d
8: 00000013
0000000c <reset>:
c: f000 f804 bl 18 <notmain>
10: e7ff b.n 12 <hang>
00000012 <hang>:
12: e7fe b.n 12 <hang>
Looks good.
And run it
qemu-system-arm -M lm3s811evb -m 8K -nographic -kernel notmain.bin
01234567
Then ctrl-a then x to exit
QEMU: Terminated
-cpu cortex-m4 works as well as one would expect. Would have to try to find things different between the m3 and m4 that might show up in a sim like this and go from there.
After Luminary Micro (acquired by ti a while ago now) I do not think anyone else put the effort in for a machine. But as already discussed in at least one question at this site, you can run the cores (an exercise for the reader).
For versatilepb
int notmain ( void )
{
unsigned int ra;
for(ra=0;;ra++)
{
ra&=7;
PUT32(0x101f1000,0x30+ra);
}
return(0);
}
qemu-system-arm -machine versatileab -cpu cortex-m4 -nographic -monitor null -kernel notmain.elf
qemu-system-arm: This board cannot be used with Cortex-M CPUs

You can't arbitrarily plug different CPU types into an Arm board model. If you try it then the resulting system may work by luck, or may crash, or have odd behaviour; in some cases the -cpu option will just be ignored. This is because the CPU integration with the board matters: things like interrupt controllers are part of the board, not the CPU, but not all CPUs will work with all interrupt controllers. Often QEMU is not as good as it could be about detecting and reporting errors for user options that aren't valid.
In this case you're probably using an older QEMU: newer ones will correctly report:
qemu-system-arm: This board cannot be used with Cortex-M CPUs
if you try to use '-machine versatilepb' with '-cpu cortex-m4'. Older ones would either crash or just misbehave.
Generally the best thing is to use the CPU type that the board has by default (ie don't specify a -cpu option), for every board type except the "virt" board. If you want to write code for a Cortex-M4, you should look for a board type that has a Cortex-M4. The mps2-an386 is probably a good option. (If your QEMU doesn't have that board type, upgrade to a newer one: there have been a lot of M-profile emulation bug fixes anyway that you'll want to have.)

Related

cross compile an arm assembly and simulate non OS arm environment wih qemu on linux

Currently, I'm trying to test an arm assembly code that I wrote. I work on Ubuntu, so I downloaded a cross compiler tool chain (arm-linux-gnueabi) so I can compile my code and then I test it using qemu-arm. But when I try to compile with arm-none-eabi-gcc it compiles but it doesn't work with qemu-arm. My guess is it doesn't work because I'm compiling for bare metal arm environment. My question is how can I use qemu-system-arm instead of qemu-arm to simulate a bare metal arm environment and test my code ?

You want assembly you only need binutils, dont use a C compiler on assembly, it may work but doesnt that just leave a bad taste in your mouth? You probably didnt separately link and/or left the stock bootstrap and linker script with arm-non-eabi-gcc. The example below does not care about arm-none-eabi- vs arm-linux-gnueabi-
Qemu uarts tend to not actually implement an amount of time to wait for the character to go out, nor need any initialization, YMMV.
memmap
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
}
so.s
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
hang: b hang
reset:
ldr r0,=0x101f1000
mov r1,#0
top:
add r1,r1,#1
and r1,r1,#0x07
orr r1,r1,#0x30
str r1,[r0]
b top
build
arm-linux-gnueabi-as --warn --fatal-warnings -march=armv5t so.s -o so.o
arm-linux-gnueabi-ld so.o -T memmap -o notmain.elf
arm-linux-gnueabi-objdump -D notmain.elf > notmain.list
arm-linux-gnueabi-objcopy notmain.elf -O binary notmain.bin
run
qemu-system-arm -M versatilepb -m 128M -nographic -kernel notmain.bin
then ctrl-a then x to exit the qemu console back to the command line.
This will print out 1234567012345670... forever or until you stop it
Another way to run is
qemu-system-arm -M versatilepb -m 128M -kernel notmain.bin
and then ctrl-alt-3 (not F3 but 3) will switch to the serial0 console
and you can see the output, and can close out of the qemu console when done.
There are other machines you can experiment with. Their peripherals of course will vary, as well as the architecture, most should be either compatible with armv4 arm instructions or thumb instructions if a cortex-m.
Adding C functions to this is fairly simple.

How to use the enhanced multiplier instructions of ARMv5TE instruction set

I'm using an ARM966E-S RISC-CPU and was wondering how to use the apparently available instruction set extensions for better DSP performance, e. g. an enhanced multiplier instruction.
I've read in the technical reference manual that these instruction set extensions are available but I don't know how to use/activate them.
Can anybody help?
Thanks in advance!

Why not just try it? Or read the manual for your toolchain, for example with gcc
so.s
ldrd r0,[r2]
ldr r2,[r2]
test
arm-none-eabi-as so.s -o so.o
arm-none-eabi-as -march=armv5t so.s -o so.o
so.s: Assembler messages:
so.s:3: Error: selected processor does not support `ldrd r0,[r2]' in ARM mode
arm-none-eabi-as -march=armv5te so.s -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <.text>:
0: e1c200d0 ldrd r0, [r2]
4: e5922000 ldr r2, [r2]

Programming STM32F4x WITHOUT IDE on Debian

This is my first question on this website, and I'm not sure about my English..
I want to know if there is a way to program the Nucleo STM32F446RE (via USB, not via JTAG) WITHOUT using any IDE.
For training purpose, I want to program with only a text editor (i use kate), Makefiles and command line.
What I already found/installed:
gcc-arm-none-eabi (6-2017-q2-update)
It contains, i think, all we need to compile (but I don't think there is a asm compiler in there).
There is example of code in C, and makefiles (that I don't totally understand). It seems to compile well (I tried the "minimum" example).
Here is the example I used:
#ifndef __NO_SYSTEM_INIT
void SystemInit()
{}
#endif
void main()
{
for (;;);
}
And here is the Makefile:
# Selecting Core
CORTEX_M=4
# Use newlib-nano. To disable it, specify USE_NANO=
USE_NANO=--specs=nano.specs
# Use seimhosting or not
USE_SEMIHOST=--specs=rdimon.specs
USE_NOHOST=--specs=nosys.specs
CORE=CM$(CORTEX_M)
BASE=../..
# Compiler & Linker
CC=arm-none-eabi-gcc
CXX=arm-none-eabi-g++
# Options for specific architecture
ARCH_FLAGS=-mthumb -mcpu=cortex-m$(CORTEX_M)
# Startup code
STARTUP=$(BASE)/startup/startup_ARM$(CORE).S
# -Os -flto -ffunction-sections -fdata-sections to compile for code size
CFLAGS=$(ARCH_FLAGS) $(STARTUP_DEFS) -Os -flto -ffunction-sections -fdata-sections
CXXFLAGS=$(CFLAGS)
# Link for code size
GC=-Wl,--gc-sections
# Create map file
MAP=-Wl,-Map=$(NAME).map
NAME=minimum
STARTUP_DEFS=-D__STARTUP_CLEAR_BSS -D__START=main
LDSCRIPTS=-L. -L$(BASE)/ldscripts -T nokeep.ld
LFLAGS=$(USE_NANO) $(USE_NOHOST) $(LDSCRIPTS) $(GC) $(MAP)
$(NAME)-$(CORE).axf: $(NAME).c $(STARTUP)
$(CC) $^ $(CFLAGS) $(LFLAGS) -o $#
clean:
rm -f $(NAME)*.axf *.map *.o
I modified it in order to set cortex-m4 instead of cortex-m0.
After running the make command I get minimum.map and minimum.axf files.
But I don't know how to load the object code in the device. ( and is it normal not to have a minimum.o file ? )

I would call something like this a minimal example with C code, the infinite loop is not necessary in this case, but is inspired by yours.
vectors.s
.thumb
.globl _start
_start:
.word 0x20002000
.word reset
.word done
.word done
.thumb_func
reset:
bl centry
b done
.thumb_func
done:
b done
so.c
void centry ( void )
{
for(;;) continue;
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
build
arm-none-eabi-as vectors.s -o vectors.o
arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
arm-none-eabi-ld -T flash.ld vectors.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
examine
08000000 <_start>:
8000000: 20002000 andcs r2, r0, r0
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 08000017 stmdaeq r0, {r0, r1, r2, r4}
800000c: 08000017 stmdaeq r0, {r0, r1, r2, r4}
08000010 <reset>:
8000010: f000 f802 bl 8000018 <centry>
8000014: e7ff b.n 8000016 <done>
08000016 <done>:
8000016: e7fe b.n 8000016 <done>
08000018 <centry>:
8000018: e7fe b.n 8000018 <centry>
800001a: 46c0 nop ; (mov r8, r8)
Likely not required but read the docs, folks use 0x08000000, technically it is 0x00000000, the stm32 family maps 0x08000000 to 0x00000000 as described in the documentation based on the boot pins. Inspection needs to show that the vector table is the first thing, you have told the toolchain these are thumb addresses in the vector table (lsbit is set). Could have put the C entry function (main() is not required, that is just a convention) in the vector table as the reset function. I have no .data nor .bss initialization so something like this would not allow the use of .data nor assuming .bss variables are zero, have to write before you read. Adding more code to the bootstrap (and linker script) would allow for that.
arm-none-eabi-objcopy so.elf -O binary so.bin
Will create a binary that depending on the tools you use may be used to load the program. If this is a nucleo board you can copy that file to the virtual usb drive. Clearly this program wont show anything interesting. Using openocd or other SWD debugger software (if you have a nucleo board you dont need any other hardware) you can stop and restart the program to try to see it running.
You can read the documentation to see the addresses and how to program the peripherals.
thumb2 is just extensions to thumb, you can stick with traditional thumb or add cortex-m4 or armv7m to the command line (cpu/arch) to try to reduce the number of instructions but trade off for larger instructions.
there are no doubt tools out there but it is fairly easy to write your own program to interface with the serial bootloader to download your program into the device.

Is it possible to create a basic bare-metal Assembly bootup/startup program using only GNU LD command-line options

Is it possible to create a basic bare-metal Assembly bootup/startup program using only GNU LD command-line options in lieu of a customary -T scriptfile for a Cortex-M4 target?
I have reviewed the GNU LD documentation and searched various locations including this site; however, I have not found any information suggesting that the exclusive use of command-line options for the GNU linker is possible or not possible.
My attempt to manage the object file layout without a customary vendor provided *.ld scriptfile is purely academic. This not homework. I'm not requesting any help for writing the startup Assembly code. I'm merely looking for a definitive answer or further resource direction.
$ arm-none-eabi-ld bootup.o -o bootup #bootup.ld.cli.file
Sample bootup.ld.cli.file content
--entry 0x0
--Ttext=0x0
--section-start .isr_vector=0x0
--section-start _start=0x4
--section-start .MyCode=0x8c
--Tdata=0x20000000
--Tbss=0x20000000
-M=bootup.map
--print-gc-sections

you have your answer right there the -Ttext=number -Tdata=number and so on are no gnu linker script items they are gnu command line items. note the at sign on your command line.
A gnu linker script looks more like this (although most are significantly more complicated even if they dont need to be).
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Note that the gnu linker is a bit funny when you use the -Ttext=address approach, sometimes it will insert gaps you might have a few Kbytes of program and instead of it just linearly placing it at address like it should it will put some, then pad some dead space, then put some more, never figured out why but for extremely limited targets the linker script (vs command line) all other factors held constant, does not put the gap in the output.
EDIT:
so.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
b hang
.thumb_func
hang: b .
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl dummy
dummy:
bx lr
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
blinker02.c
void dummy ( unsigned int );
int notmain ( void )
{
unsigned int ra;
for(ra=0;ra<100;ra++) dummy(ra);
return(0);
}
Makefile
ARMGNU = arm-none-eabi
AOPS = --warn -mcpu=cortex-m0
COPS = -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0
all : blinker02.bin sols.bin socl.bin
clean:
rm -f *.bin
rm -f *.o
rm -f *.elf
rm -f *.list
so.o : so.s
$(ARMGNU)-as $(AOPS) so.s -o so.o
flash.o : flash.s
$(ARMGNU)-as $(AOPS) flash.s -o flash.o
blinker02.o : blinker02.c
$(ARMGNU)-gcc $(COPS) -mthumb -c blinker02.c -o blinker02.o
blinker02.bin : flash.ld flash.o blinker02.o
$(ARMGNU)-ld -o blinker02.elf -T flash.ld flash.o blinker02.o
$(ARMGNU)-objdump -D blinker02.elf > blinker02.list
$(ARMGNU)-objcopy blinker02.elf blinker02.bin -O binary
sols.bin : so.o
$(ARMGNU)-ld -o sols.elf -T flash.ld so.o
$(ARMGNU)-objdump -D sols.elf > sols.list
$(ARMGNU)-objcopy sols.elf sols.bin -O binary
socl.bin : so.o
$(ARMGNU)-ld -o socl.elf -Ttext=0x08000000 -Tbss=0x20000000 so.o
$(ARMGNU)-objdump -D socl.elf > socl.list
$(ARMGNU)-objcopy socl.elf socl.bin -O binary
The difference between the command line and the linker script socl and sols list files are the names
diff sols.list socl.list
2c2
< sols.elf: file format elf32-littlearm
---
> socl.elf: file format elf32-littlearm
Not going to bother with demonstrating the difference you may see down the road.
For assembly only you dont need to worry about the no start files and other command line options (on gcc). With C objects you do. by not allowing the linker to use the as-built/configured toolchains (or lets say C library) bootstrap code, you have to provide one, if you dont complicate the linker script to the point that specific object files are called out then the ordering of objects on the command line matters, if you swap flash.o and blinker02.o on the ld command line in the makefile, the binary wont work. you can set entry points all you want but those are strictly for the loader, if this is bare metal which it appears to be then the entry point is useless, the hardware boots how it boots, in this case with a cortex-m address zero is the value to load in the stack pointer, address four is the address to the reset vector (with the lsbit set since this is a thumb only machine, let the tools do that for you using the gnu assembler specific thumb_func to indicate the next label is a branch destination address).
I sprinkled cortex-m0 about one because that is what I took this code from and two the original armv4t and armv5t or as called out in the newer arm docs "all thumb variants", is the most portable arm instruction set across the arm cores. with your cortex-m4 you can get rid of that or perhaps make it a -m3 or -m4 to pull in the armv7-m thumb2 extensions.
so the short answer is
arm-none-eabi-ld -o so.elf -Ttext=0x08000000 -Tbss=0x20000000 so.o
Is more than adequate for making working binaries ASSUMING you dont need a .data.
.data requires a lot more stuff, linker script, a more complicated bootstrap, etc. That or you do a copy-jump thing, compile the REAL program to be run in sram only (different entry point full sized arm style but at the ram base address), then write an adhoc tool to take that binary and turn it into say .word 0xabcdef entries in a program that copies from flash to ram the whole REAL program then branches, that copy and jump program is now flash only with no .data nor .bss really needed and can use the command line, so can the REAL ram only program. And I probably lost you already on that one.
Likewise, using the command line you cannot or should not assume that .bss is zeroed, your bootstrap has to do that too. Now if you have .bss and no .data, then sure you could blindly zero all of the ram on boot before you branch to your C programs entry point (I use notmain() both because at least one old compiler added unnecessary garbage to the binary if it saw a main() function and to emphasize the point that normally there is nothing magic about the function named main().).
Linker scripts are toolchain specific, so no reason to expect gnu linker scripts to port to Kiel to port to ARM (yes I know ARM owns Kiel now was referring to RVCT or whatever it is now), etc. So that is the first .data/.bss problem. Ideally you want your tools to do the work, so they know how bit .data and .bss are so just let them tell you, how you let them tell you is crafting the linker script right (at least with ld) and that is tricky, but it creates variables if you will that can define things like start address for .bss, end address for .bss maybe even some math to subtract them and get length, likewise for .data, then in the bootstrap assembly language you can zero out the .bss memory using start address and length, and/or start address and end address. For .data you need two addresses, where you put it in flash (more linker script foo) and where it wants to go in ram, and the length then the bootstrap copies.
so basically if you write this code
unsigned int x=5;
unsigned int y;
and you use a command line linker script, there is no reason whatsoever to expect x to be 5 or y to be 0 when the first C function is entered that uses those variables. If you assume that x will be a 5 then your program will fail.
if you do this instead
unsigned int x;
unsigned int y;
void myfun ( void )
{
x=5;
y=0;
}
now those assignments are instructions in .text and not values in .data so it will always work command line or not simple linker script or complicated, etc.

Objdump ARM aarch64 code?

I have an elf arm aarch64 binary and i want to disassemble it .text section using objdump.My machine is amd64.
I tried Using objdump for ARM architecture: Disassembling to ARM but objdump is not identifying the binary so not able to disassemble it.

Go to http://releases.linaro.org/latest/components/toolchain/binaries/ and get your choice of gcc-linaro-aarch64-linux-gnu-4.9-* like for example gcc-linaro-aarch64-linux-gnu-4.9-2014.07_linux.tar.bz2.
After unpacking invoke aarch64-linux-gnu-objdump, ie:
echo "int main(void) {return 42;}" > test.c
gcc-linaro-aarch64-linux-gnu-4.9-2014.07_linux/bin/aarch64-linux-gnu-gcc -c test.c
gcc-linaro-aarch64-linux-gnu-4.9-2014.07_linux/bin/aarch64-linux-gnu-objdump -d test.o
to get objdump.
test.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <main>:
0: 52800540 mov w0, #0x2a // #42
4: d65f03c0 ret

Use the same toolchain which you used to compile the binary
In case of ARM architecture, it would generally be like arm-linux-gnueabi-gcc, so for objdump you should use
arm-linux-gnueabi-objdump
At present I guess you must using x86 toolchain(objdump) to disassemble the binary compiled using ARM toolchain hence the error