OPENOCD, flash program to ARM Cortex M0 (JTAG) - arm

I'm new on OpenOCD, has anyone attempted to use Olimex OpenOCD to actually flash program hex file (from Kiel say) into ARM CORTEX M0 (generic).
What do I need to setup script file to take each word of the hex file to performs mww (memory write word) within the MCU flash?, can anyone provide an example. I use python.
I open for suggestion.
I use Window PC.

All Cortex M0 that I know of have no JTAG, but only SWD support. SWD is not yet available in OpenOCD - it is still in development.
Another note: The method for writing the flash memory is specific for each vendor/chip.

Sure, what platform in particular? some googling will find the exact sequence. flash unlock, erase, program, etc.
Section 6 of this page for example.
http://pygmy.utoh.org/riscy/cortex/led-lpc17xx.html
I am trying to figure out what board I did it on but those were pretty much the commands I followed and it worked just fine. It may have been the leaflabs maple mini. The steps are the same. To avoid the steps or scripting it, etc. what I ended up doing was writing a few lines of bootloader that said if ram+0 = 0x12345678, and ram+4 = 0x87654321 then branch to ram+8 else infinite loop. then it was trivial to use the jtag to load a program into ram with the two words and an entry point at 0x08 bytes into ram, press reset and run the program. On a cold power up it just hits the infinite loop. I spend my day on a bigger arm based system loading everything into ram using jtag so it made it quite comfortable. You could just script it in openocd and simply type the openocd command have the flash load happen.

Update for people stopping by...
You do not have to use mww, if you're just trying to flash-program (eg. upload your own code) to your microcontroller.
Some time ago, OpenOCD got a ("built-in") convenience-script, that you can use for programming, this "command" is called "program".
Here's an example from the documentation on the "program" command:
openocd -f interface/ftdi/jtag-lock-pick_tiny_2.cfg -f board/stm32f3discovery.cfg -c "program filename.elf verify reset"
-Replace "stm32f3discovery" by your board. If you use a different adapter, replace the interface with the appropriate configuration file.

Related

How to use qemu to do profiling on a algorithm

I have a program run well on Ubuntu now. The program is written purely in C. And it will finally run on a embedded processor. I hope to know its execution speed on different target, like Cortex M3, M4 or A series. As there are pretty much double type arithmatic, the difference should be obvious. Currently, my idea is to use qemu to count the instruction executed for some set of data. As the program is only about data processing, the only required resource should be RAM.
I don't need the very accurate result, as it will only serve as a guide to choose CPU. Is there some easy guide for the task? I have little experience with qemu. I saw there are two ways to invoke qemu: qemu-system-arm and qemu-user. I guess the most accurate simulation result should be got by qemu-system-arm. What's more, Cortex M series should not support Linux due to lack of MMU, right?
There's not a lot out there on how to do this because it is in general pretty difficult to do profiling of guest code on an emulated CPU/system and get from that useful information about performance on real hardware. This is because performance on real hardware is typically strongly dependent on events which most emulation (and in particular QEMU) does not model, such as:
branch mispredictions
cache misses
TLB misses
memory latency
as well as (usually less significantly than the above) differences in number of cycles between instructions -- for instance on the Cortex-M4 VMUL.F32 is 1 cycle but VDIV.F32 is 14.
For a Cortex-M CPU the hardware is simple enough (ie no cache, no MMU) that a simple instruction count may not be too far out from real-world performance, but for an A-class core instruction count alone is likely to be highly misleading.
The other approach people sometimes want to take is to measure run-time under a model; this can be even worse than counting instructions, because some things that are very fast on real hardware are very slow in an emulator (eg floating point instructions), and because the JIT process introduces extra overhead at unpredictable times.
On top of the conceptual difficulties, QEMU is not currently a very helpful environment for obtaining information like instruction counts. You can probably do something with the TCG plugin API (if you're lucky one of the example plugins may be sufficient).
In summary, if you want to know the performance of a piece of code on specific hardware, the easiest and most accurate approach is to run and profile the code on the real hardware.
I post my solution here, in case someone just want a rough estimation as me.
Eclipse embedded CDT provides a good start point. You can start with a simple LED blink template. It support soft FP arithmatic only now. You can start qemu with the built embedded program, and a picture of the STM32F407 board will appear. The LED on the picture will blink as the program goes.
The key point is I can use the script from Counting machine instructions using gdb to count instruction on the qemu target.
However, it seems eclipse embedded cdt will stuck when some library code is executed. Here is my work around, start qemu mannually(the command is got by command 'ps' when eclipse start qemu):
In the first terminal:
qemu-system-gnuarmeclipse --verbose --verbose --board STM32F4-Discovery --mcu STM32F407VG --gdb tcp::1235 -d unimp,guest_errors --semihosting-config enable=on,target=native --semihosting-cmdline blinky_c
Then in the second terminal:
arm-none-eabi-gdb blinky_c.elf
and below is the command history I input in the gdb terminal
(gdb) show commands
1 target remote :1235
2 load
3 info register
4 set $sp = 0x20020000
5 info register
6 b main
7 c
Then you can use the gdb to count instruction as in Counting machine instructions using gdb.
One big problem with the method is the speed is really slow, as gdb will use stepi to go through all the code to be counted before get a result. It cost me around 3 hours in my ubuntu VMware machine to get 5.5M instruction executed.
One thing that you can do is use a simulation setup like the one used in this sample: https://github.com/swedishembedded/sdk/blob/main/samples/lib/control/dcmotor/src/main.c
This may look like an ordinary embedded application, but the data structure vdev actually resides in a different application running on the computer (in this case a dc motor simulator) and all reads and writes to it are automatically done over network by the simulator that runs this. The platform definition is here: https://github.com/swedishembedded/sdk/blob/main/samples/lib/control/dcmotor/boards/custom_board.repl This is how the structure is mapped.
From here it is not hard to implement advanced memory profiling by directly capturing reads and writes from the simulated application (which in this case is compiled for STM32 ARM).

How to obtain PMU events when running ARM bigLITTLE inside gem5

I'm running an ARM full system simulation in gem5 and the configurations I'm using in the commandline is:
./build/ARM/gem5.perf configs/example/arm/fs_bigLITTLE.py
--kernel=/home/ting-bazinga/gem5/linux-arm-gem5/vmlinux
--caches
--disk /home/ting-bazinga/gem5/fs_imgs/disks/aarch64-ubuntu-trusty-headless.img
--bootscript /home/ting-bazinga/gem5/fs_imgs/test.rcS
From the post Using perf_event with the ARM PMU inside gem5 I presume obtaining PMU events in gem5 is possible. However I didn't found the exact method for how to do that.
perf can be used to obtain PMU information, on my local machine I can just download the linux-tools-common in my terminal to use that tool. But I can't do the same with the simulation. There isn't a perf binary that I can just find online (or maybe anyone can give a hint of how to write this kind of binary?) And I also tried downloading the linux-tools-common package, copying it into the disk image then using the makefile to compile it. But somehow the makefile does not work in the simulated system.
Or can the PMU events be abtained using C code? In the post I mentioned above someone used C code to count the number of mispredicted branches by the branch predictor unit during a specific task. And I can use perf_event_open to obtain number of instruction during an execution. However running the perf_event_open code requires root, but I cannot use sudo in the simulated system.
Can anybody give me some instructions on how to obtain PMU events in gem5? Many thanks.

vxWorks-7 jump to bootloader

Is there a way in vxWorks-7 to restart to the bootloader, we are using u-boot. We have a hardware issue with our current board, and would like to jump to u-boot to restart the bard instead of power cycling it.
Dennis
A methode is maybe to use the reboot() function on console (that of course can also be used from code).
If the BSP implementation is okay, it should restart without a power-cycle.
At least that works for us -> VxWorks7 with an ARM system. But i have to say, we are not using u-boot (we are using barebox).

Zynq Qspi Booting

I'm using Arm DS-5 and Xilinx SDK for developing programs on Zynq board.
I'm trying to boot Zynq 702 board from Qspi Flash.
What I've done so far is generating FSBL project from Xilinx SDK, and combining it with my application using Bootgen tool in SDK, then program it into the flash.
There are several questions in my mind.
DS-5 produces an .axf file, Bootgen requires an .elf file. Can I use
the .axf file by just changing its extension to .elf or do I require
some more steps?
Is there a tool that shows the inner structure of an .axf file?
Showing what is where?
And how can I debug if I managed to boot from QSPI. For example I want to debug my application from the beginning of FSBL, is it possible? Because in Qspi Boot, When I power on the board, my application would start running and when I connect with JTAG, it would be in somewhere in my application.
An AXF might have some extra ARM-toolchain magic in it (I'm not sure off-hand), but at heart it's an ELF file - the ARM toolchain provides fromelf for poking around inside them, but other tools like readelf and objdump also work.
I'm not familiar with the Zynq platform so I don't know any specific debugger tricks, but a general one is just to put an infinite loop at the start of your code (possibly using volatile or inline asm trickery if necessary to prevent optimisation) - once the debugger's connected and broken into it, you just move the PC past the loop and continue.
You can totally halt QSPI-booted Zynq via JTAG and do whatever you want with it. However, there are some quirks. Sometimes Zynq goes into some kind of lockup, and JTAG doesn't work at all, and you need to power-cycle before retrying. Some not-so-well-written peripherial might die after starting software over JTAG, so you might need to re-load bitstream first. And there are some Vivado-related bugs (like the one where you cannot re-flash the board unless you downgrade to 2017.2 or change MIO2-6 pulls or patch the FSBL) but i'm not sure if they apply in your case.

Dynamically Configure FPGA From Host Program

I was wondering if anyone knows an efficient way to program the FPGA(PL) for a Xilinx Zynq-7 series or related devices,from a host C program (not on the SoC, but from the host PC). Is there an Xilinx API I can use/include in my program. As the only way I can think of doing it at the moment is invoking command line programming via Impact.
Basically I want to put the SDK "Program FPGA" functionality in my host C program where the user selects a prebuilt .bit file (and .elf file if possible) to program the FPGA/(SoC). This is just for a test of concept, later I would like to put this dynamic configuration onto one of the ARM CPU's.
Many Thanks
Sam
At the very least you'll need an intermediate MPU/MCU that can read from USB, as at startup most FPGAs aren't capable of much at all. I'm guessing this'll make it hard to find a MPU/library pair to do so, because there are so many options, each of which would be pretty application-specific. You're better off starting with programming them off an ARM chip, since you'll need some CPU with the FPGA in any case.
This seems somewhat useful.

Resources