Can someone help me out please! I do not know if the answer is general, or specific to the board and software versions I am working with. I am out of my previous areas here, and do not even know the right question to ask.
EDITs added at the bottom
What I currently want, is to create a program that will run standalone (bare metal; no OS) on a A20-OLinuXino-Micro-4GB board, that needs to use (at least) some standard math library calls. Eventually, I will want to load it into NAND, and run it on powerup, but for now I am trying to manually load it (loady) from the U-Boot (github.com/linux-sunxi/u-boot-sunxi/wiki) serial 'console', after booting from an SD card. Standalone is needed, because the linux distro level access to the hardware GPIO ports is not very flexible, when working with more than one bit (port in a port group) at a time, and quite slow. Too slow for the target application, and I did not really want to try modifying / adding a kernel module just to see if that would be fast enough.
Are there some standard gcc / ld flags needed to create a bare metal standalone program, and include some library routines? Beyond -ffreestanding and -static? Is there some special glue code needed? Is there something else I have not even thought of?
If found and looked over Beagleboard bare metal programming (stackoverflow.com/questions/6870712/beagleboard-bare-metal-programming). The answer there is good info, but is assembler, and does not reference any library. Application hangs when calling printf to uart with bare metal raspberry pi might show a cause for the problem. The (currently) bottom answer points to problems with VFP, and I already ran across problems with soft/hard floating point options. That shows some assembler code, but I am missing details about how to add a wrapper/glue to combine with c code. My assembler coding is rusty, but would adding equivalent code at the start of hello_world (at least before the reference to the sin() function (likely) get things working? Maybe adding it into the libstubs code.
I am using another A20 board for the main development environment.
$ gcc --version gcc (Debian 4.6.3-14) 4.6.3 Copyright (C) 2011 Free
Software Foundation, Inc. This is free software; see the source for
copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld.bfd --version GNU ld (GNU Binutils for Debian) 2.22 Copyright
2011 Free Software Foundation, Inc. This program is free software; you
may redistribute it under the terms of the GNU General Public License
version 3 or (at your option) a later version. This program has
absolutely no warranty.
$ uname -a Linux a20-OLinuXino 3.4.67+ #6 SMP PREEMPT Fri Nov 1
17:32:40 EET 2013 armv7l GNU/Linux
I have been able to create bootable U-Boot images for the board on SD cards from the repo, either building directly from the linux-sunxi distro that was supplied with the board, or by cross-compiling from a Fedora 21 machine. Same for the standalone hello_world program that came in the examples for U-boot, which can be loaded and run from the U-Boot console.
However, reducing the sample program to bare minimum, then adding code that needs math.h, -lm and -lc fails (in various iterations) with 'software interrupt' or 'undefined operation' type errors. The original sample program was being linked with -lgcc, but a little checking showed that nothing was actually being included from the library. The identical binary was created without the library, so the question might be 'what does it take to use any library with a bare metal program?'
sun7i# go 0x48000000
## Starting application at 0x48000000 ...
Hello math World
undefined instruction
pc : [<48000010>] lr : [<4800000c>]
sp : 7fb66da0 ip : 7fb672c0 fp : 00000000
r10: 00000002 r9 : 7fb66f0c r8 : 7fb67778
r7 : 7ffbbaf8 r6 : 00000001 r5 : 7fb6777c r4 : 48000000
r3 : 00000083 r2 : 7ffbc7fc r1 : 0000000a r0 : 00000011
Flags: nZCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
To get that far, I had to tweak build options, to specify hardware floating point, since that is how the base libraries were compiled.
Here are the corresponding source and build script files
hello_world.c
#include <common.h>
#include <math.h>
int hello_world (void)
{
double tst;
tst = 0.33333333333;
printf ("Hello math World\n");
tst = sin(0.5);
// printf ("sin test %d : %d\n", (int)tst, (int)(1000 * tst));
return (0);
}
build script
#! /bin/bash
UBOOT="/home/olimex/u-boot-sunxi"
SRC="$UBOOT/examples/standalone"
#INCLS="-nostdinc -isystem /usr/lib/gcc/arm-linux-gnueabihf/4.6/include -I$UBOOT/include -I$UBOOT/arch/arm/include"
INCLS="-I$UBOOT/include -I$UBOOT/arch/arm/include"
#-v
GCCOPTS="\
-D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x4a000000\
-Wall -Wstrict-prototypes -Wno-format-security\
-fno-builtin -ffreestanding -Os -fno-stack-protector\
-g -fstack-usage -Wno-format-nonliteral -fno-toplevel-reorder\
-DCONFIG_ARM -D__ARM__ -marm -mno-thumb-interwork\
-mabi=aapcs-linux -mword-relocations -march=armv7-a\
-ffunction-sections -fdata-sections -fno-common -ffixed-r9\
-mhard-float -pipe"
# -msoft-float -pipe
OBJS="hello_world.o libstubs.o"
LDOPTS="--verbose -g -Ttext 0x48000000"
#--verbose
#LIBS="-static -L/usr/lib/gcc/arm-linux-gnueabihf/4.6 -lm -lc"
LIBS="-static -lm -lc"
#-lgcc
gcc -Wp,-MD,stubs.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(stubs)"\
-D"KBUILD_MODNAME=KBUILD_STR(stubs)"\
-c -o stubs.o $SRC/stubs.c
ld.bfd -r -o libstubs.o stubs.o
gcc -Wp,-MD,hello_world.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(hello_world)"\
-D"KBUILD_MODNAME=KBUILD_STR(hello_world)"\
-c -o hello_world.o hello_world.c
ld.bfd $LDOPTS -o hello_world -e hello_world $OBJS $LIBS
objcopy -O binary hello_world hello_world.bin
EDITS added:
The application that this is to be part of needs both some fairly high speed GPIO and some math functions. Should only need sin() and maybe sqrt(). My previous testing for the GPIO got the toggling of single pin (port in a port group) up to 8MHz. The constraints for the application need to get the full cycle time in the 10µs (100Hhz) range, which includes reading all pins from a single port, and writing a few pins on other ports, synchronized with the timing limitations of the attached ADC chip (3 ADC reads). I have bare metal code that is doing (simulating) that process in about 2.1µs. Now I need to add in the math to process the values, the output of which will set some more outputs. Future planned improvements including using SIMD for the math, and dedicating the second core to the math, while the first does the GPIO and 'feeds' the calculations.
The needed math code / logic has already been written into a simulation program using very standard (c99) code. I just need to port it into the bare metal program. Need to get 'math' to work first.
As first thing, I suggest reading this excellent paper on Bare Metal programming with ARM and GNU http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf.
Then, I would make sure you avoid any syscall to the Linux Kernel (which you don't have and your compiler will try to make), e.g. avoiding returning values in void main() - that should never return, anyway.
Finally, I would either user newlib or, if you need to use a small subset of what libraries have to offer you, write a custom implementation.
Keep in mind you are using an Allinner SoC which is not the best for bare metal documentation, but you can find the TRM here http://www.soselectronic.com/a_info/resource/c/20_UM-V1.020130322.pdf, so I would check if libraries (if you decide to use them) or your code need some special silicon hardware to be initialized (some interconnect fabric, clock and power domains, etc.).
I strongly suggest, if you just need to use sin() and similar, to just deploy your own.
Related
Often a question leads me into another question.
While trying to debug an inline assembly code, I met with another basic problem.
To make long story short, I want to run arm64 baremetal hello world program on qemu.
#include <stdio.h>
int main()
{
printf("Hello World!\n");
}
I compile it like this :
aarch64-none-elf-gcc -g test.c
I get undefined reference errors for _exit _sbrk _write _close _lseek _read _fstat and _isatty. I learned in the past the -specs=rdimon.specs compile options removes this errors.
So I ran
aarch64-none-elf-gcc -g test.c -specs=rdimon.specs
and it compiles ok with a.out file.
Now I run qemu baremetal program to debug the code.
qemu-system-aarch64 -machine
virt,gic-version=max,secure=true,virtualization=true -cpu cortex-a72
-kernel a.out -m 2048M -nographic -s -S
and here is the gdb run result.
ckim#ckim-ubuntu:~/testdir/testinlinedebugprint$ aarch64-none-elf-gdb a.out
GNU gdb (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.1.90.20201028-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-none-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.linaro.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(gdb) set architecture aarch64
The target architecture is set to "aarch64".
(gdb) set serial baud 115200
(gdb) target remote :1234
Remote debugging using :1234
_start ()
at /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S:90
90 /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory.
(gdb) b main
Breakpoint 1 at 0x4002f8: file test.c, line 26.
(gdb)
(gdb) r
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) c
Continuing.
It doesn't break and hangs.
What am I doing wrong? and how can I solve the /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory. problem?
Any help will be really appreciated. Thanks!
ADD :
I realized I have asked the same question (How to compile baremetal hello_world.c and run it on qemu-system-aarch64?) before (Ah! my memory..) I realized I need all the stuff like start.S crt0.S and the linker script, . . .I stupidly thought the baremetal compiler will take care of it automatically when actually I have to fill the really low level things. I've worked on baremetal programs in some cases but it was after someone else had already set up those initial environment(sometimes I even modified them many times!). In baremetal, you have to privide all the things. There isn't anything you can take for granted because it's "bare metal". I realized this basic thing so late..
When you build a program for "bare metal" that means that you need to configure your toolchain to produce a binary that works on the specific piece of bare metal that you try to run it on. For instance, the binary must:
put its code somewhere in the machine's memory map where there is either ROM or RAM
put its data where there is RAM
make sure that on startup the stack pointer is correctly initialized to point into RAM
if it wants to print output, include routines which access a suitable device on that machine. This is likely a serial port, and serial ports are often entirely different devices, located at different addresses, on different machines
If any of these things are wrong or don't match the actual machine you run on, the result is typically exactly what you see -- the program crashes without output.
More specifically, rdimon.specs tells the compiler to build in C library functions which do some of this via the "semihosting" debugger ABI (which has support for "print string" and some other things). Your QEMU command line doesn't enable implementation of semihosting (you can turn it on with the -semihosting option), so that won't work at all. But there are probably other problems you're also hitting.
I'm running OS X 10.12 and I'm developing a basic text-based operating system. I have developed a boot loader and that seems to be running fine. My only problem is that when I attempt to compile my kernel into pure binary, the linker won't work. I have done some research and I think that this is because of the fact OS X runs the Darwin linker and not the GNU linker. Because of this, I have downloaded and installed the GNU binutils. However, it still won't work...
Here is my kernel:
void main() {
// Create pointer to a character and point it to the first cell of video
// memory (i.e. the top-left)
char* video_memory = (char*) 0xb8000;
// At that address, put an x
*video_memory = 'x';
}
And this is when I attempt to compile it:
Hazims-MacBook-Pro:32 bit root# gcc -ffreestanding -c kernel.c -o kernel.o
Hazims-MacBook-Pro:32 bit root# ld -o kernel.bin -T text 0x1000 kernel.o --oformat binary
ld: unknown option: -T
Hazims-MacBook-Pro:32 bit root#
I would love to know how to solve this issue. Thank you for your time.
-T is a gcc compiler flag, not a linker flag. Have a look at this:
With these components you can now actually build the final kernel. We use the compiler as the linker as it allows it greater control over the link process. Note that if your kernel is written in C++, you should use the C++ compiler instead.
You can then link your kernel using:
i686-elf-gcc -T linker.ld -o myos.bin -ffreestanding -O2 -nostdlib boot.o kernel.o -lgcc
Note: Some tutorials suggest linking with i686-elf-ld rather than the compiler, however this prevents the compiler from performing various tasks during linking.
The file myos.bin is now your kernel (all other files are no longer needed). Note that we are linking against libgcc, which implements various runtime routines that your cross-compiler depends on. Leaving it out will give you problems in the future. If you did not build and install libgcc as part of your cross-compiler, you should go back now and build a cross-compiler with libgcc. The compiler depends on this library and will use it regardless of whether you provide it or not.
This is all taken directly from OSDev, which documents the entire process, including a bare-bones kernel, very clearly.
You're correct in that you probably want binutils for this especially if you're coding baremetal; while clang as is purports to be a cross compiler it's far from optimal or usable here, for various reasons. noticing you're developing on ARM I infer; you want this.
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm
Aside from the fact that gcc does this thing better than clang markedly, there's also the issue that ld does not build on OS X from the binutils package; it in some configurations silently fails so you may in fact never have actually installed it despite watching libiberty etc build, it will even go through the motions of compiling the source of that target sometimes and just refuse to link it... to the fellow with the lousy tone blaming OP, if you had relevant experience ie ever had built this under this condition you would know that is patently obnoxious. it'd be nice if you'd refrain from discouraging people from asking legitimate questions.
In the CXXfilt package they mumble about apple-darwin not being a target; try changing FAKE_TARGET to instead of mn10003000-whatever or whatever they used, to apple-rhapsody some time.
You're still in way better shape just building them from current if you say need to strip relocations from something or want to work on restoring static linkage to the system. which is missing by default from that clang installation as well...anyhow it's not really that ld couldn't work with macho, it's all there, codewise in fact...that i am sure of
Regarding locating things in memory, you may want to refer to a linker script
http://svn.screwjackllc.com/?p=noid.git;a=blob_plain;f=new_mbed_bs.link_script.ld
As i have some code in there that will directly place things in memory, rather than doing it on command line it is more reproducible to go with the linker script. it's a little complex but what it is doing is setting up a couple of regions of memory to be used with my memory allocators, you can use malloc, but you should prefer not to use actual malloc; dynamic memory is fine when it isn't dynamic...heh...
The script also sets flags for the stack and heap locations, although they are just markers, not loaded til go time, they actually get placed, stack and heap, by the startup code, which is in assembly and rather readable and well commented (hard to believe, i know)... neat trick, you have some persistence to volatile memory, so i set aside a very tiny bit to flip and you can do things like have it control what bootloader to run on the next power cycle. again you are 100% correct regarding the linker; seems to be you are headed the right direction. incidentally another way you can modify objects prior to loading them , and preload things in memory, similar to this method, well there are a ton of ways, but, check out objcopy and objdump...you can use gdb to dump srecs of structures in memory, note the address, and then before linking but after assembly use dd to insert the records you extracted with gdb back in to extracted sections..is one of my favorite ways just because is smartass route :D also, if you are tight on memory ever and need to precalculate constants it's one way to optimize things...that way is actually closer to what ld is doing, just doing it by hand... probably path of least resistance on this now though is linker script.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Well i've searched whole internet for code that will run using arm-linux-gnueabi-as and qemu.
To print a integer value. From string. A routine will help.
obviously you have not searched the whole internet...because, if nothing else, the qemu source code contains all the answers to your questions...
QEMU emulates systems. Are you trying to do something bare metal on an emulated arm system? Or are you trying to run an arm linux operating system and within the operating system create a program in assembly that runs on the operating system which is running on qemu? if it is the latter it has nothing to do with qemu, it is an operating system question not a qemu question. and it is not a language question (asm) but an operating system question. asm and low level are two different things. asm does not imply low level access and is definitely not required (and rarely used) for low level stuff.
If you are not interested in an operating system but just bare metal, here is one of many ways to get serial output on the qemu console. strings and integers and such are a language-less problem (same solution can apply to any programming language, asm, c, python, etc solve the problem THEN apply the language to the problem).
start.s
.globl _start
_start:
ldr r0,=0x101f1000
mov r1,#0
loop:
add r1,r1,#1
and r1,r1,#7
add r1,r1,#0x30
str r1,[r0]
mov r2,#0x0D
str r2,[r0]
mov r2,#0x0A
str r2,[r0]
b loop
memmap
MEMORY
{
rom : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > rom
}
Makefile
CROSS_COMPILE ?= arm-none-linux-gnueabi
AOPS = --warn --fatal-warnings
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding
hello_world.bin : startup.s memmap
$(CROSS_COMPILE)-as $(AOPS) startup.s -o startup.o
$(CROSS_COMPILE)-ld startup.o -T memmap -o hello_world.elf
$(CROSS_COMPILE)-objdump -D hello_world.elf > hello_world.list
$(CROSS_COMPILE)-objcopy hello_world.elf -O binary hello_world.bin
run with
qemu-system-arm -M versatilepb -m 128M -nographic -kernel hello_world.bin
but I dont know how to get out of the console
Instead if you do this:
qemu-system-arm -M versatilepb -m 128M -kernel hello_world.bin
and then ctrl-alt-3 (not F3 but 3) will switch to the serial console
and you can see the output, and can close out of qemu by closing the console window.
The uart tx register in the versatilepb qemu target is at address 0x101f1000. Because this is a simulation, you can "just try" and find out that writing to this address without doing any real-world uart setup "just works", and being an emulated system it probably instantly transmits the character so you dont have to wait for it to complete or poll for an empty tx buffer slot or anything like that. Just blast away. This will get you started, then you can try to do real-world stuff later if that is of interest. (other uarts in other targets may be closer to real-world like and require some initialization, and waiting for an empty tx buf).
asm makes the above more painful, just use C instead for your low level/bare metal programs.
Also if doing arm bare metal you can use arm-none-linux-gnueabi, if you know what you are doing, but will eventually find arm-none-eabi a better fit since you are not using an operating system much less linux.
I will ask my question by giving an example. Now I have a function called do_something().
It has three versions: do_something(), do_something_sse3(), and do_something_sse4(). When my program runs, it will detect the CPU feature (see if it supports SSE3 or SSE4) and call one of the three versions accordingly.
The problem is: When I build my program with GCC, I have to set -msse4 for do_something_sse4() to compile (e.g. for the header file <smmintrin.h> to be included).
However, if I set -msse4, then gcc is allowed to use SSE4 instructions, and some intrinsics in do_something_sse3() is also translated to some SSE4 instructions. So if my program runs on CPU that has only SSE3 (but no SSE4) support, it causes "illegal instruction" when calls do_something_sse3().
Maybe I have some bad practice. Could you give some suggestions? Thanks.
I think that the Mystical's tip is fine, but if you really want to do it in the one file, you can use proper pragmas, for instance:
#pragma GCC target("sse4.1")
GCC 4.4 is needed, AFAIR.
I think you want to build what's called a "CPU dispatcher". I got one working (as far as I know) for GCC but have not got it to work with Visual Studio.
cpu dispatcher for visual studio for AVX and SSE
I would check out Agner Fog's vectorclass and the file dispatch_example.cpp
http://www.agner.org/optimize/#vectorclass
g++ -O3 -msse2 -c dispatch_example.cpp -od2.o
g++ -O3 -msse4.1 -c dispatch_example.cpp -od5.o
g++ -O3 -mavx -c dispatch_example.cpp -od8.o
g++ -O3 -msse2 instrset_detect.cpp d2.o d5.o d8.o
Here is an example of compiling a separate object file for each optimization setting:
http://notabs.org/lfsr/software/index.htm
But even this method fails when gcc link time optimization (-flto) is used. So how can a single executable be built with full optimization for different processors? The only solution I can find is to use include directives to make the C files behave as a single compilation unit so that -flto is not needed. Here is an example using that method:
http://notabs.org/blcutil/index.htm
If you are using GCC 4.9 or above on an i686 or x86_64 machine, then you are supposed to be able to use intrinsics regardless of your -march=XXX and -mXXX options. You could write your do_something() accordingly:
void do_something()
{
byte temp[18];
if (HasSSE2())
{
const __m128i i = _mm_loadu_si128((const __m128i*)(ptr));
...
}
else if (HasSSSE3())
{
const __m128i MASK = _mm_set_epi8(12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3);
_mm_storeu_si128(reinterpret_cast<__m128i*>(temp),
_mm_shuffle_epi8(_mm_loadu_si128((const __m128i*)(ptr)), MASK));
}
else
{
// Do the byte swap/endian reversal manually
...
}
}
You have to supply HasSSE2(), HasSSSE3() and friends. Also see Intrinsics for CPUID like informations?.
Also see GCC Issue 57202 - Please make the intrinsics headers like immintrin.h be usable without compiler flags. But I don't believe the feature works. I regularly encounter compile failures because GCC does not make intrinsics available.
I have an embedded hardware system which contains a bootloader based on ARMboot (which is very similar to Uboot and PPCboot).
This bootloader normally serves to load uClinux image from the flash. However, now I am trying to use this bootloader to run a standalone helloworld application, which does not require any linked library. Actually, it contains only while(1){} code in the main function.
My problem is that I cannot find out what GCC settings should I use in order to build a standalone properly formatted binary.
I do use following build command:
cr16-elf-gcc -o helloworld helloworld.c -nostdlib
which produces warning message:
warning: cannot find entry symbol _start; defaulting to 00000004
Thereafter, within the bootloader, I upload a produced application and start it at some address:
tftpboot 0xa00000 helloworld
go 0xa00004
But it doesn't work :(
The system reboots.
Normally it should just hang (because of while(1)).
I don't know that loader, but I think you should use objcopy like this to dump your executable data to a raw binary file. Don't jump to ELF headers, people :)
objcopy -O binary ./a.out o.bin
Also try to compile position independent code and to read ld and gcc manuals.
The linker is complaining about missing startup code.
You need to provide two things: startup code and a linker command file that defines the address map of your target processor.
In your case the startup code is as "bl main", but usually the startup code will initialize the stack pointer at least before branching to main.
If you know you are loading your example into RAM, you can start your program at main directly. You'll need to determine main()'s address ate use that for your "go" command.
I operate on the ARM non-os non-lib all day every day. This is my current gcc options:
arm-whatever-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c hello.c -o hello.o
then I use the linker to combine the C code with the vector tables and such, even if it is not an image that needs a vector table using a vector table makes it easy to put your entry point on the first instruction.
Any reason you can't statically link at least the standard libraries in? You should have a working program and the benefits of the standard libraries without external dependencies.
Also, does your toolchain/IDE provide differentiate between "standalone application" and "linux application"? The IDE for the AVR32 has that distinction and is able to generate either a program that runs within the embedded linux environment or a standalone program that basically becomes the OS.