Profiling on baremetal embedded systems (ARM)

Profiling on baremetal embedded systems (ARM) - c

I am wondering how you profile software on bare metal systems (ARM Cortex a8)? Previously I was using a simulator which had built-in benchmark statistics, and now I want to compare results from real hardware (running on a BeagleBoard-Xm).
I understand that you can use gprof, however I'm kind of lost as that assumes you have to run Linux on the target system?
I build the executable file with Codesourcery's arm-none-eabi cross-compiler and the target system is running FreeRTOS.

Closely evaluate what you mean by "profiling". You are indeed operating very close to bare metal, and it's likely that you will be required to take on some of the work performed by a tool like gprof.
Do you want to time a function call? or an ISR? How about toggling a GPIO line upon entering and exiting the code under inspection. A data logger or oscilloscope can be set to trigger on these events. (In my experience, a data logger is more convenient since mine could be configured to capture a sequence of these events - allowing me to compute average timings.)
Do you want to count the number of executions? The Cortex A8 comes equipped with a number of features (like configurable event counters) that can assist: link. Your ARM chip may be equipped with other peripherals that could be used, as well (depending on the vendor). Regardless, take a look at the above link - the new ARMs have lots of cool features that I don't get to play with as much as I would like! ;-)

I have managed to get profiling working for ARM Cortex M. As the GNU ARM Embedded (launchpad) tools do not come with profiling libraries included, I have added the necessary glue and profiling functionality.
References:
See http://mcuoneclipse.com/2015/08/23/tutorial-using-gnu-profiling-gprof-with-arm-cortex-m/
I hope this helps.

Related

Is it possible to build C source code written for ARM to run on x86 platform?

I got some source code in plain C. It is built to run on ARM with a cross-compiler on Windows.
Now I want to do some white-box unit testing of the code. And I don't want to run the test on an ARM board because it may not be very efficient.
Since the C source code is instruction set independent, and I just want to verify the software logic at the C-level, I am wondering if it is possible to build the C source code to run on x86. It makes debugging and inspection much easier.
Or is there some proper way to do white-box testing of C code written for ARM?
Thanks!
BTW, I have read the thread: How does native android code written for ARM run on x86?
It seems not to be what I need.
ADD 1 - 10:42 PM 7/18/2021
The physical ARM hardware that the code targets may not be ready yet. So I want to verify the software logic at a very early phase. Based on John Bollinger's answer, I am thinking about another option: Just build the binary as usual for ARM. Then use QEMU to find a compatible ARM cpu to run the code. The code is assured not to touch any special hardware IO. So a compatible cpu should be enough to run all the code I think. If this is possible, I think I need to find a way to let QEMU load my binary on a piece of emulated bare-metal. And to get some output, I need to at least write a serial port driver to bridge my binary to the serial port.
ADD 2 - 8:55 AM 7/19/2021
Some more background, the C code is targeting ARMv8 ISA. And the code manipulates some hardware IPs which are not ready yet. I am planning to create a software HAL for those IPs and verify the C code over the HAL. If the HAL is good enough, everything can be purely software and I guess the only missing part is a ARMv8 compatible CPU, which I believe QEMU can provide.
ADD 3 - 11:30 PM 7/19/2021
Just found this link. It seems QEMU user mode emulation can be leveraged to run ARM binaries directly on a x86 Linux. Will try it and get back later.
ADD 4 - 11:42 AM 7/29/2021
An some useful links:
Override a function call in C
__attribute__((weak)) and static libraries
What are weak functions and what are their uses? I am using a stm32f429 micro controller
Why the weak symbol defined in the same .a file but different .o file is not used as fall back?

Now I want to do some white-box unit testing of the code. And I don't want to run the test on an ARM board because it may not be very efficient.
What does efficiency have to do with it if you cannot be sure that your test results are representative of the real target platform?
Since the C source code is instruction set independent,
C programs vary widely in how portable they are. This tends to be less related to CPU instruction set than to target machine and implementation details such as data type sizes, word endianness, memory size, and floating-point implementation, and implementation-defined and undefined program behaviors.
It is not at all safe to assume that just because the program is written in C, that it can be successfully built for a different target machine than it was developed for, or that if it is built for a different target, that its behavior there is the same.
I am wondering if it is possible to build the C source code to run on x86. It makes debugging and inspection much easier.
It is probably possible to build the program. There are several good C compilers for various x86 and x86_64 platforms, and if your C code conforms to one of the language specifications then those compilers should accept it. Whether the behavior of the result is representative of the behavior on ARM is a different question, however (see above).
It may nevertheless be a worthwhile exercise to port the program to another platform, such as x86 or x86_64 Windows. Such an exercise would be likely to unmask some bugs. But this would be a project in its own right, and I doubt that it would be worth the effort if there is no intention to run the program on the new platform other than for testing purposes.
Or is there some proper way to do white-box testing of C code written for ARM?
I don't know what proper means to you, but there is no substitute for testing on the target hardware that you ultimately want to support. You might find it useful to perform initial testing on emulated hardware, however, instead of on a physical ARM device.

If you were writing ARM code for a windows desktop application there would be no difference for the most part and the code would just compile and run. My guess is you are developing for some device that does some specific task.
I do this for a lot of my embedded ARM code. Typically the core algorithms work just fine when built on x86 but the whole application does not. The problems come in with the hardware other than the CPU. For example I might be using a LCD display, some sensors, and Free RTOS on the ARM project but the code that runs on Windows does not have any of these. What I do is extract important pieces of C/C++ code and write a test framework around it. In the real ARM code the device is reading values from a sensor and doing something with it. In the test code that runs on a desktop the code reads from a data file with fake sensor values and writes its output to a datafile that can be analyzed. This way I can have white box tests for the most complicated code.
May I ask, roughly what does this code do? An ARM processor with no peripherals would be kind of useless. Typically we use the processor to interact with some other hardware like a screen, some buttons, or Bluetooth. It's those interactions that are going to be the most problematic.

Is it possible to compile and run the dlib library on embedded devices with ARM Cortex-M7 processors?

I have just started using the amazing dlib library in Visual Studio and I have been able to compile and run the face detection examples. I was wondering if it would be possible to compile and run the library on an Mbed device, such as this one, with an M7 (or other M-series) processor. In other words, what specifications should I look out for to determine whether a microcontroller can, if at all, run dlib. Note that Mbed devices run C++ code, so it would be possible to copy and paste the source code of dlib and compile it, but I want to know if this is possible before I purchase a board. Also, if the RAM and ROM of the board are not enough, I can always attach external RAM/ROM.
Alternatively, if anyone knows of a library that can perform face detection or recognition on an embedded device, I would be happy to hear it.
Thanks.

Although the F769 is a considerably powerful embedded device there is no chance that dlib will run on it. Machine learning algorithms, even if not run in real-time, typically require a vast amount of RAM memory, specially for online-learning (learning on the target). You can take a look at ARMs very own CMSIS NN library to see what's currently "state-of-the-art" for devices that size.

Take a look at Tensorflow Lite for Microcontrollers. You can put these on embedded devices. Wake words and object detection runs easily on various boards (Arduino Nano 33, SparkFun Edge). There's a compiler included for Mbed.

Microcontrollers are not suitable for video and image recognition even if you attach external ram. The chip you where suggestiong is top of the line in microcontroller world. But this means only 2Mb for ALL your software and only 512kb of ram onboard. Think of it this way the image you need with enough detail to recoginze someone would be atleast a few mb.
I would suggest that you look at to the application processors of ARM (A series) or NVIDA Jetson.

What are the conditions to make the embedded C code written for one processor to work on another processor (when architecture is same)?

I am reading a primer text on embedded C programming (it is: Barr & Massa, 2007). For companion hardware board to run examples, they recommend Arcom VIPER-Lite. But I do already have Beaglebone Black (BBB) board and I don't want to buy a new board.
The two boards have same architecture, namely, ARM but BBB uses TI AM3358BZCZ100 processor, clocked at 1GHz, whereas VIPER-Lite uses Intel's PXA255 processor, clocked at 200MHz. The BBB board has more memory and basically more of everything.
My question is, can I follow and execute embedded C code examples given in this book on my BBB board? Does embedded C code depend on processors or architecture or something else? I understand that very specific examples addressing particular peripherals/drivers may not be portable from one platform to next but is entire embedded code like this? I am hope I am making sense.

Intel X-Scale is not the same as Cortex-A8 - ARM architecture has been through a number of versions since then, and Intel implemented some proprietary features too. Moreover ARM licencees are free to implement proprietary peripheral sets and subsets of the core architecture.
In particular for board bring-up the PLL and SDRAM controller will be entirely different between different vendor's devices and even between different generations of device from the same vendor.
If you are running code on an already implemented OS (BeagleBone is delivered with Linux already installed), then you will not need to worry about board bring up and peripheral support; but you will also miss out in learning a great deal about embedded systems (other than perhaps embedded systems that run pre-installed or vendor supplied Linux distros, which is a small subset or all embedded systems).
Beyond board bring up the boards will have entirely different peripheral sets, different on-board devices, and differing I/O at different addresses and with different register sets - no code that directly accesses the I/O is will work. Code accessing devices through a standard Linus device driver interface may well work because an abstraction to a common interface is provided by the OS and board vendor or third party device drivers.
If you are not running the code on Linux - or are implementing low-level device drivers, then the programming environment in terms of memory map, MMU, PLL, I/O control, peripherals, and even instruction set will be different and any code will require adaptation, and you will need to get familiar with the corresponding data sheets or reference manuals and also the ARM technical reference.
So the answer is that it depends largely on where you are starting from; bare-metal or Linux.
There are resources related to "bare-metal" development on BeagleBone Black in particular TI's own bare-metal StarterWare library.

The concept you learn from most good text books can be applied any microprocessor or micro controller at a very high level. But if you want learn embedded system programming using Beagle Bone Black I suggest the following youtube links from Prof Derek Molloy. Prof Molloy does a fantastic job of teaching embedded System programming using BBB. Here are few links for you to get started.
The Beaglebone - Unboxing, Introduction Tutorial and First Example
Beaglebone: C/C++ Programming Introduction for ARM Embedded Linux Development using Eclipse CDT
Beaglebone: GPIO Programming on ARM Embedded Linux
The one problem you might want to be aware is that the video were based on Angstrom Distribution. The current BBB is shipped with Debian Distribution.
Also if you want to learn bare-metal embedded system program you might want check out
Embedded Systems - Shape The World
You might also want to take look at the following link for more material.
Beginning with programming microcontrollers

The xscale although ARM instruction set derived is not ARM in the sense that you want to use it. For some reason the native mode is big endian and normal ARM native mode is little endian. But more important the core processor is not insignificant, but not the bulk if the porting effort, most of not all of the peripherals are expected to differ between those two chips, most of the code would need a re-write unless it is a purely portable C program that runs on any say linux, then arm, xscale, x86 are completely irrelevant to the discussion. I suspect you are not in that situation. Even compiled as a generic command line linux app would still have problems in this situation with the endianness.
Basically you are saying I have two fords and I want to take the wheels off of one and put them on the other, without understanding that one is a ford festiva and the other is lets say an F350 pickup. Just because they have the same looking tiny ford badge on them, doesnt mean that the entirety of their components are identical.
If you are desperate to re-use these binaries, you are better off finding or making a simulator for the prior platform and then you can run that on anything.

Run executable on MINI2440 with NO OS

I have Fedora installed on my PC and I have a Friendly ARM Mini2440 board. I have successfully installed Linux kernel and everything is working. Now I have some image processing program, which I want to run on the board without OS. The only process running on board should be my program. And in that program how can I access the on board camera to take image from, and serial port to send output to the PC.

You're talking about what is often called a bare-metal environment. Google can help you, for example here. In a bare-metal environment you have to have a good understanding of your hardware because you have to take care of a lot of things that the OS normally handles.
I've been working (off and on) on bare-metal support for my ELLCC cross development tool-chain. I have the ARM implementation pretty far along but there is still quite a bit of work to do. I have written about some of my experiences on my blog.
First off, you have to get your program started. You'll need to write some start-up code, usually in assembly, to handle the initialization of the processor as it comes out of reset (or is powered on). The start-up code then typically passes control to code written in C that ultimately directly or indirectly calls your main() function. Getting to main() is a huge step in your bare-metal adventure!
Next, you need to decide how to support your hardware's I/O devices which in your case include the camera and serial port. How much of the standard C (or C++) library does your image processing require? You might need to add some support for functions like printf() or malloc() that normally need some kind of OS support. A simple "hello world" would be a good thing to try next.
ELLCC has examples of various levels of ARM bare-metal in the examples directory. They range from a simple main() up to and including MMU and TCP/IP support. The source for all of it can be browsed here.
I started writing this before I left for work this morning and didn't have time to finish. Both dwelch and Clifford had good suggestions. A bootloader might make your job a lot simpler and documentation on your hardware is crucial.

First you must realise that without an OS, you are responsible for bringing the board up from reset including configuring the PLL and SDRAM, and also for the driver code for every device on the board you wish to use. To do that required adequate documentation of the board and it devices.
It is possible that you can use the existing bootloader to configure the core and SDRAM, but that may not meet your requirement for the only process running on the board should be your image processing program.
Additionally you will need some means of loading and bootstrapping; again the existing Linux bootstrapper may suit.
It is by no means straightforward and cannot really be described in detail here.

ARM development environment for newbies

I am looking for some information on programming ARM devices, in a particular non-particular way [1]. Assume that I am writing code for an ARM processor that is used a machine similar to a Apple II/Atari "**" XL/Commodore 64/DOS-PC, or even something that runs a multitasking OS like VMS or SUNOs. Assume further that any peripherals/OS specific stuff has already been abstracted into subroutines.Examples of this type of programing might be: a text/curses based game like rogue or moria; a curses based word processor ( or rather something based on a curses like library ) ; or a modem/terminal program.
I'm looking for two things. Materials to help learn ARM programming, though the ARM System Developers Guide may be enough, other resources would be helpful I'm looking in particular for something which explains the software ( and relative hardware ie registers ) differences of various generations of processor.
The other thing I'm looking for is a development environment which inculdes, emulation, a decent macro assembler, and a debugger. Along with any thing else that will help me see what is going on inside my programs.
[1] OK. Sorry I just couldn't resist that particular pun.

You have the choice of using ARM Cortex M or A series. If you are going to develop high end applications such as those which run on smartphones / tablets, then learning about ARM A is your choice. If you are going for an emphasis with hardware/low level stuff such as controllers then you should go for ARM Cortex-M. If you are into real time applications (which I doubt is your case, them use the R series).
Most of these new ARM generations are based on ARMv7 architecture and ISA, so reading the manuals on that could get you started. Most recently, a new ARMv8 architecture and ISA have been announced, it supports 64 bit processing.
Download the reference and technical manuals from ARM site to learn about the HW/peripherals.
I would go with auslen's suggestion of buying a board, you could go with TI's Stellaris Launch pad which has an ARM-M4F processor (supports floating point and SIMD), it sells for 12.99$
http://www.ti.com/ww/en/launchpad/stellaris_head.html?DCMP=stellaris-launchpad&HQS=stellaris-launchpad-b
or you could go with ST's discovery board (based on the same processor as above), but has audio, accelerometer and usb on board. it sells for 14.99$ http://www.st.com/internet/evalboard/product/252419.jsp
or the STM F3 board (10.99$)
http://www.st.com/internet/evalboard/product/254044.jsp
In any case, you need to check the examples which come with the board, without which you could go nowhere easily. The board comes with its own drivers, all is abstracted in a way, so you could get started from there!
As for OS, if your interest is an RTOS, ARM provides the CMSIS RTOS for it's M series processors
http://www.arm.com/products/processors/cortex-m/cortex-microcontroller-software-interface-standard.php
This book offers an introduction to the generations of ARM processors. Then focuses on cortex M3. It covers its ISA with lots of assembly code. It also addresses the built-in peripherals and how to start-up with C.
http://www.amazon.ca/Definitive-Guide-ARM-Cortex-M3/dp/185617963X/ref=sr_1_1?ie=UTF8&qid=1352506616&sr=8-1
good luck

infocenter.arm.com and look at the various ARM ARMs (architectural reference manuals) and TRM's (technical reference manuals) for the various architectures and cores. these manuals are better than most other companies documentation. except for the new 64 bit stuff, the difference from one architecture to the next is somewhat subtle as far as the instruction set goes. the major differences have to do with the peripherals, the mmu is a slow changing thing, the interrupt manager has taken big steps and the fpu has been replaced at least once if not twice wholesale (if you even have an fpu which, having one is the exception not the rule it consumes a huge real estate for such little return).
I am confused with your question. I think it is important to draw the line between learning the architecture/instruction set and learning the operating system calls, these are two separate things. Operating system stuff you rarely need to look beyond the source code (C/C++), and the limited asm is for hand tuned C libraries or boostrap code, and interrupt wrappers. Likewise the architecture, registers, instructions, etc vs the peripherals (the cores from arm generally have very very few peripherals, the bulk are in the vendor specific stuff) which I would separate as a separate learning curve, has little to do with asm and the instruction set so no different than learning a peripheral on any other platform, just some addresses you read and write.
If you are looking for non-operating system bare metal the stm32f0 discovery is $10, I highly recommend it. Looks like ti has a stellaris launchpad for just a little more (waiting for mine to arrive so I cant talk much about them, and shipping is free from ti so the cost is basically the same as the stm32 boards) the stm32f4 discovery is about $20 and I would barely call a microcontroller with all the stuff the cortex-m4 has.
Moving up to linux capable or designed for linux systems there is the raspberry pi, beaglebone and open-rd and on up (pandaboard). Again though you are just writing just another linux C/C++ program so there isnt much excitement there (related to a specific platform, the entertainment is the same for all platforms) and very little arm knowledge required if any. It is very easy to use any of these platforms for bare metal programming giving you race car like performance compared to the ARM based microcontrollers.
I have a thumb simulator which you are probably not interested in. gdb has the armulator which was the cornerstone of the company back in the day. skyeye or something like that has an arm instruction set simulator as does qemu, none of them will give you great visibility other than what gdb can provide. opencores has the amber project an armv2 clone, which you can see the close relationship to the armv4 and newer that you will not find rtl for without a box full of cash. with my arm and chip experience (No I do not work for arm) I do find the amber project worth looking at, but many folks wont know what to do with it and really are not interested in that level of visibility. (it is instruction compatible, a good design, but dont think you are looking at an arm design, no secrets there). you can learn the basic arm architecture from it and then move on to hardware for example...
With the microcontrollers being cortex-m based, you might find the older microcontrollers a better stepping stone to the upper end arm cores. ARM7tdmi based stuff like the sam7s and others from nxp, st, atmel, etc which you can still find at sparkfun and microcontroller pros and other places for arduino like prices.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight