How to run arm64 baremetal hello world program on qemu? - arm

Often a question leads me into another question.
While trying to debug an inline assembly code, I met with another basic problem.
To make long story short, I want to run arm64 baremetal hello world program on qemu.
#include <stdio.h>
int main()
{
printf("Hello World!\n");
}
I compile it like this :
aarch64-none-elf-gcc -g test.c
I get undefined reference errors for _exit _sbrk _write _close _lseek _read _fstat and _isatty. I learned in the past the -specs=rdimon.specs compile options removes this errors.
So I ran
aarch64-none-elf-gcc -g test.c -specs=rdimon.specs
and it compiles ok with a.out file.
Now I run qemu baremetal program to debug the code.
qemu-system-aarch64 -machine
virt,gic-version=max,secure=true,virtualization=true -cpu cortex-a72
-kernel a.out -m 2048M -nographic -s -S
and here is the gdb run result.
ckim#ckim-ubuntu:~/testdir/testinlinedebugprint$ aarch64-none-elf-gdb a.out
GNU gdb (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.1.90.20201028-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-none-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.linaro.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(gdb) set architecture aarch64
The target architecture is set to "aarch64".
(gdb) set serial baud 115200
(gdb) target remote :1234
Remote debugging using :1234
_start ()
at /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S:90
90 /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory.
(gdb) b main
Breakpoint 1 at 0x4002f8: file test.c, line 26.
(gdb)
(gdb) r
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) c
Continuing.
It doesn't break and hangs.
What am I doing wrong? and how can I solve the /tmp/dgboter/bbs/build02--cen7x86_64/buildbot/cen7x86_64--aarch64-none-elf/build/src/newlib-cygwin/libgloss/aarch64/crt0.S: No such file or directory. problem?
Any help will be really appreciated. Thanks!
ADD :
I realized I have asked the same question (How to compile baremetal hello_world.c and run it on qemu-system-aarch64?) before (Ah! my memory..) I realized I need all the stuff like start.S crt0.S and the linker script, . . .I stupidly thought the baremetal compiler will take care of it automatically when actually I have to fill the really low level things. I've worked on baremetal programs in some cases but it was after someone else had already set up those initial environment(sometimes I even modified them many times!). In baremetal, you have to privide all the things. There isn't anything you can take for granted because it's "bare metal". I realized this basic thing so late..

When you build a program for "bare metal" that means that you need to configure your toolchain to produce a binary that works on the specific piece of bare metal that you try to run it on. For instance, the binary must:
put its code somewhere in the machine's memory map where there is either ROM or RAM
put its data where there is RAM
make sure that on startup the stack pointer is correctly initialized to point into RAM
if it wants to print output, include routines which access a suitable device on that machine. This is likely a serial port, and serial ports are often entirely different devices, located at different addresses, on different machines
If any of these things are wrong or don't match the actual machine you run on, the result is typically exactly what you see -- the program crashes without output.
More specifically, rdimon.specs tells the compiler to build in C library functions which do some of this via the "semihosting" debugger ABI (which has support for "print string" and some other things). Your QEMU command line doesn't enable implementation of semihosting (you can turn it on with the -semihosting option), so that won't work at all. But there are probably other problems you're also hitting.

Related

How do I use the GNU linker instead of the Darwin Linker?

I'm running OS X 10.12 and I'm developing a basic text-based operating system. I have developed a boot loader and that seems to be running fine. My only problem is that when I attempt to compile my kernel into pure binary, the linker won't work. I have done some research and I think that this is because of the fact OS X runs the Darwin linker and not the GNU linker. Because of this, I have downloaded and installed the GNU binutils. However, it still won't work...
Here is my kernel:
void main() {
// Create pointer to a character and point it to the first cell of video
// memory (i.e. the top-left)
char* video_memory = (char*) 0xb8000;
// At that address, put an x
*video_memory = 'x';
}
And this is when I attempt to compile it:
Hazims-MacBook-Pro:32 bit root# gcc -ffreestanding -c kernel.c -o kernel.o
Hazims-MacBook-Pro:32 bit root# ld -o kernel.bin -T text 0x1000 kernel.o --oformat binary
ld: unknown option: -T
Hazims-MacBook-Pro:32 bit root#
I would love to know how to solve this issue. Thank you for your time.
-T is a gcc compiler flag, not a linker flag. Have a look at this:
With these components you can now actually build the final kernel. We use the compiler as the linker as it allows it greater control over the link process. Note that if your kernel is written in C++, you should use the C++ compiler instead.
You can then link your kernel using:
i686-elf-gcc -T linker.ld -o myos.bin -ffreestanding -O2 -nostdlib boot.o kernel.o -lgcc
Note: Some tutorials suggest linking with i686-elf-ld rather than the compiler, however this prevents the compiler from performing various tasks during linking.
The file myos.bin is now your kernel (all other files are no longer needed). Note that we are linking against libgcc, which implements various runtime routines that your cross-compiler depends on. Leaving it out will give you problems in the future. If you did not build and install libgcc as part of your cross-compiler, you should go back now and build a cross-compiler with libgcc. The compiler depends on this library and will use it regardless of whether you provide it or not.
This is all taken directly from OSDev, which documents the entire process, including a bare-bones kernel, very clearly.
You're correct in that you probably want binutils for this especially if you're coding baremetal; while clang as is purports to be a cross compiler it's far from optimal or usable here, for various reasons. noticing you're developing on ARM I infer; you want this.
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm
Aside from the fact that gcc does this thing better than clang markedly, there's also the issue that ld does not build on OS X from the binutils package; it in some configurations silently fails so you may in fact never have actually installed it despite watching libiberty etc build, it will even go through the motions of compiling the source of that target sometimes and just refuse to link it... to the fellow with the lousy tone blaming OP, if you had relevant experience ie ever had built this under this condition you would know that is patently obnoxious. it'd be nice if you'd refrain from discouraging people from asking legitimate questions.
In the CXXfilt package they mumble about apple-darwin not being a target; try changing FAKE_TARGET to instead of mn10003000-whatever or whatever they used, to apple-rhapsody some time.
You're still in way better shape just building them from current if you say need to strip relocations from something or want to work on restoring static linkage to the system. which is missing by default from that clang installation as well...anyhow it's not really that ld couldn't work with macho, it's all there, codewise in fact...that i am sure of
Regarding locating things in memory, you may want to refer to a linker script
http://svn.screwjackllc.com/?p=noid.git;a=blob_plain;f=new_mbed_bs.link_script.ld
As i have some code in there that will directly place things in memory, rather than doing it on command line it is more reproducible to go with the linker script. it's a little complex but what it is doing is setting up a couple of regions of memory to be used with my memory allocators, you can use malloc, but you should prefer not to use actual malloc; dynamic memory is fine when it isn't dynamic...heh...
The script also sets flags for the stack and heap locations, although they are just markers, not loaded til go time, they actually get placed, stack and heap, by the startup code, which is in assembly and rather readable and well commented (hard to believe, i know)... neat trick, you have some persistence to volatile memory, so i set aside a very tiny bit to flip and you can do things like have it control what bootloader to run on the next power cycle. again you are 100% correct regarding the linker; seems to be you are headed the right direction. incidentally another way you can modify objects prior to loading them , and preload things in memory, similar to this method, well there are a ton of ways, but, check out objcopy and objdump...you can use gdb to dump srecs of structures in memory, note the address, and then before linking but after assembly use dd to insert the records you extracted with gdb back in to extracted sections..is one of my favorite ways just because is smartass route :D also, if you are tight on memory ever and need to precalculate constants it's one way to optimize things...that way is actually closer to what ld is doing, just doing it by hand... probably path of least resistance on this now though is linker script.

How to debug an experimental toolchain producing malformed executables

I am working on cross compiling an experimental GNU free Linux toolchain using clang (instead of gcc), compiler-rt (instead of libgcc), libunwind (available at http://llvm.org/git/libunwind.git) (instead of libgcc_s), lld (instead of GNU ld), libcxx (instead of libstdc++), libcxxabi (instead of not sure, I'm unclear on the GNU distinction between libstdc++ and its ABI) and musl (instead of glibc).
Using a musl based gcc cross compiler and a few patches I've managed to successfully compile all of the above and sucessfully compile and link a simple hello world C program with it. Something seems to have gone wrong, however, as running the hello world program results in a segmentation fault:
$ ./hello
Segmentation fault
$
Normally I would simply debug it with gdb, but herein lies the problem:
$ gdb ./hello
Reading symbols from ./hello...Dwarf Error: Could not find abbrev number 5 in CU at offset 0x52 [in module /home/main/code/main/asm/hello]
(no debugging symbols found)...done.
(gdb) start
Temporary breakpoint 1 at 0x206
Starting program: /hello
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb)
I can't seem to step through the program in any way, I'm guessing because the error is occuring somewhere in early C runtime startup. I can't even step through the assembly using layout asm and stepi, so I really don't know how to find out where exactly the error is occuring (to debug my toolchain).
I have confirmed that the problem resides with lld by using a GNU binutils ld to successfully link the hello world object (statically) using the cross compiled libraries and object files, which results in a functional hello world program. Since lld successfully links, however, I can't pinpoint where failure is occuring.
Note I compiled hello as a static executable and used the -v gcc/clang option to verify that all the correct libraries and object files were linked it.
Note online GDB documentation has the following to say about the above error:
On Unix systems, by default, if a shell is available on your target, gdb) uses it to start your program. Arguments of the run command are passed to the shell, which does variable substitution, expands wildcard characters and performs redirection of I/O. In some circumstances, it may be useful to disable such use of a shell, for example, when debugging the shell itself or diagnosing startup failures such as:
(gdb) run
Starting program: ./a.out
During startup program terminated with signal SIGSEGV, Segmentation fault.
which indicates the shell or the wrapper specified with ‘exec-wrapper’ crashed, not your program.
I don't think this is true, considering what I'm working with and that the problem doesn't happen when I use GNU ld, and because the suggested solution (set startup-with-shell off) doesn't work.
The croscompilling means that the compilation is done on a host machine, and the output of the compilation is the binary which shall run on a target machine. Therefore the compiled binary is not compatible with your host CPU. Instead, if your target supports this, you could run the binary there and use the debugger from your toolchain to connect to the running binary remotely if supported. Or alternatively, the debugger may also be available at the target and you can debug the binary already at place.
Just to get more feeling, try to use command file for the compiled binary, and some other binaries of your host to see possible differences.

using library in bare metal program for arm

Can someone help me out please! I do not know if the answer is general, or specific to the board and software versions I am working with. I am out of my previous areas here, and do not even know the right question to ask.
EDITs added at the bottom
What I currently want, is to create a program that will run standalone (bare metal; no OS) on a A20-OLinuXino-Micro-4GB board, that needs to use (at least) some standard math library calls. Eventually, I will want to load it into NAND, and run it on powerup, but for now I am trying to manually load it (loady) from the U-Boot (github.com/linux-sunxi/u-boot-sunxi/wiki) serial 'console', after booting from an SD card. Standalone is needed, because the linux distro level access to the hardware GPIO ports is not very flexible, when working with more than one bit (port in a port group) at a time, and quite slow. Too slow for the target application, and I did not really want to try modifying / adding a kernel module just to see if that would be fast enough.
Are there some standard gcc / ld flags needed to create a bare metal standalone program, and include some library routines? Beyond -ffreestanding and -static? Is there some special glue code needed? Is there something else I have not even thought of?
If found and looked over Beagleboard bare metal programming (stackoverflow.com/questions/6870712/beagleboard-bare-metal-programming). The answer there is good info, but is assembler, and does not reference any library. Application hangs when calling printf to uart with bare metal raspberry pi might show a cause for the problem. The (currently) bottom answer points to problems with VFP, and I already ran across problems with soft/hard floating point options. That shows some assembler code, but I am missing details about how to add a wrapper/glue to combine with c code. My assembler coding is rusty, but would adding equivalent code at the start of hello_world (at least before the reference to the sin() function (likely) get things working? Maybe adding it into the libstubs code.
I am using another A20 board for the main development environment.
$ gcc --version gcc (Debian 4.6.3-14) 4.6.3 Copyright (C) 2011 Free
Software Foundation, Inc. This is free software; see the source for
copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld.bfd --version GNU ld (GNU Binutils for Debian) 2.22 Copyright
2011 Free Software Foundation, Inc. This program is free software; you
may redistribute it under the terms of the GNU General Public License
version 3 or (at your option) a later version. This program has
absolutely no warranty.
$ uname -a Linux a20-OLinuXino 3.4.67+ #6 SMP PREEMPT Fri Nov 1
17:32:40 EET 2013 armv7l GNU/Linux
I have been able to create bootable U-Boot images for the board on SD cards from the repo, either building directly from the linux-sunxi distro that was supplied with the board, or by cross-compiling from a Fedora 21 machine. Same for the standalone hello_world program that came in the examples for U-boot, which can be loaded and run from the U-Boot console.
However, reducing the sample program to bare minimum, then adding code that needs math.h, -lm and -lc fails (in various iterations) with 'software interrupt' or 'undefined operation' type errors. The original sample program was being linked with -lgcc, but a little checking showed that nothing was actually being included from the library. The identical binary was created without the library, so the question might be 'what does it take to use any library with a bare metal program?'
sun7i# go 0x48000000
## Starting application at 0x48000000 ...
Hello math World
undefined instruction
pc : [<48000010>] lr : [<4800000c>]
sp : 7fb66da0 ip : 7fb672c0 fp : 00000000
r10: 00000002 r9 : 7fb66f0c r8 : 7fb67778
r7 : 7ffbbaf8 r6 : 00000001 r5 : 7fb6777c r4 : 48000000
r3 : 00000083 r2 : 7ffbc7fc r1 : 0000000a r0 : 00000011
Flags: nZCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
To get that far, I had to tweak build options, to specify hardware floating point, since that is how the base libraries were compiled.
Here are the corresponding source and build script files
hello_world.c
#include <common.h>
#include <math.h>
int hello_world (void)
{
double tst;
tst = 0.33333333333;
printf ("Hello math World\n");
tst = sin(0.5);
// printf ("sin test %d : %d\n", (int)tst, (int)(1000 * tst));
return (0);
}
build script
#! /bin/bash
UBOOT="/home/olimex/u-boot-sunxi"
SRC="$UBOOT/examples/standalone"
#INCLS="-nostdinc -isystem /usr/lib/gcc/arm-linux-gnueabihf/4.6/include -I$UBOOT/include -I$UBOOT/arch/arm/include"
INCLS="-I$UBOOT/include -I$UBOOT/arch/arm/include"
#-v
GCCOPTS="\
-D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x4a000000\
-Wall -Wstrict-prototypes -Wno-format-security\
-fno-builtin -ffreestanding -Os -fno-stack-protector\
-g -fstack-usage -Wno-format-nonliteral -fno-toplevel-reorder\
-DCONFIG_ARM -D__ARM__ -marm -mno-thumb-interwork\
-mabi=aapcs-linux -mword-relocations -march=armv7-a\
-ffunction-sections -fdata-sections -fno-common -ffixed-r9\
-mhard-float -pipe"
# -msoft-float -pipe
OBJS="hello_world.o libstubs.o"
LDOPTS="--verbose -g -Ttext 0x48000000"
#--verbose
#LIBS="-static -L/usr/lib/gcc/arm-linux-gnueabihf/4.6 -lm -lc"
LIBS="-static -lm -lc"
#-lgcc
gcc -Wp,-MD,stubs.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(stubs)"\
-D"KBUILD_MODNAME=KBUILD_STR(stubs)"\
-c -o stubs.o $SRC/stubs.c
ld.bfd -r -o libstubs.o stubs.o
gcc -Wp,-MD,hello_world.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(hello_world)"\
-D"KBUILD_MODNAME=KBUILD_STR(hello_world)"\
-c -o hello_world.o hello_world.c
ld.bfd $LDOPTS -o hello_world -e hello_world $OBJS $LIBS
objcopy -O binary hello_world hello_world.bin
EDITS added:
The application that this is to be part of needs both some fairly high speed GPIO and some math functions. Should only need sin() and maybe sqrt(). My previous testing for the GPIO got the toggling of single pin (port in a port group) up to 8MHz. The constraints for the application need to get the full cycle time in the 10µs (100Hhz) range, which includes reading all pins from a single port, and writing a few pins on other ports, synchronized with the timing limitations of the attached ADC chip (3 ADC reads). I have bare metal code that is doing (simulating) that process in about 2.1µs. Now I need to add in the math to process the values, the output of which will set some more outputs. Future planned improvements including using SIMD for the math, and dedicating the second core to the math, while the first does the GPIO and 'feeds' the calculations.
The needed math code / logic has already been written into a simulation program using very standard (c99) code. I just need to port it into the bare metal program. Need to get 'math' to work first.
As first thing, I suggest reading this excellent paper on Bare Metal programming with ARM and GNU http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf.
Then, I would make sure you avoid any syscall to the Linux Kernel (which you don't have and your compiler will try to make), e.g. avoiding returning values in void main() - that should never return, anyway.
Finally, I would either user newlib or, if you need to use a small subset of what libraries have to offer you, write a custom implementation.
Keep in mind you are using an Allinner SoC which is not the best for bare metal documentation, but you can find the TRM here http://www.soselectronic.com/a_info/resource/c/20_UM-V1.020130322.pdf, so I would check if libraries (if you decide to use them) or your code need some special silicon hardware to be initialized (some interconnect fabric, clock and power domains, etc.).
I strongly suggest, if you just need to use sin() and similar, to just deploy your own.

Script/Tool predicate for ARM ELF compiled for Thumb OR Arm

I have rootfs and klibc file systems. I am creating make rules and some developers have an older compiler without inter-networking.note1 I am trying to verify that all the files get built with arm only when a certain version of the compiler is detected. I have re-built the tree's several times. I was using readelf -A and looking for Tag_THUMB_ISA_use: Thumb-1, but this seem to be in arm only code (but was built with the interworking compiler) as well as thumb code. I can manually run objdump -S and examine the assembler to determine what instruction set is in use.
However, it would be much easier if I had a script/tool predicate so that find, etc can be used to search through the shadow file systems to look for binaries that may have been missed. I thought that some of this information would be in the ELF header and accessible via objdump or readelf, but I haven't found anything reliable.
Specifically I am looking for,
Compiled 'C' that wouldn't run without a CONFIG_ARM_THUMB Linux system.
make rules that use 'C' compiler flags that choke a non-thumb compilers.
note1: Interworking allow easy switching between thumb and arm modes, and the compiler will automatically generate code to support calling from either mode.
The readelf -A output doesn't describe the elf contents. It just describes the capabilities of the processor and or system that is expected or fed to the compiler. As I have an ARM926 CPU which is an ARMV5TEJ processor, gcc/ld will always set Tag_THUMB_ISA_use: Thumb-1 as it just means that ARMV5TEJ is recognized as being Thumb-1 capable. It says nothing about the code itself.
Examining the Linux arch/arm/kernel/elf.c routine elf_check_arch() shows a check for x->e_entry & 1. This leads to the following script,
readelf -h $1 | grep -q Entry.*[13579bdf]$
Ie, just look at the initial ELF entry value and see if the low bit is set. This is a fast check that fits the spirit of what I am looking for. unixsmurf has a good point that the code inside any ELF can mix and match ARM and Thumb. This maybe ok, if the program dynamically ids the CPU and selects an appropriate routine. Ie, just the presence of a Thumb instruction doesn't mean that code will execute.
Just looking at the entry value does determine which gcc compiler flags were used, at least for gcc versions 4.6 to 4.7.
Since thumb and arm sequences can be freely interchanged within an object file, even within the same section, plain ELF header inspection is not going to help you whether a file includes Thumb instructions or not.
A slightly roundabout and still not 100% foolproof way would be to use readelf -r and check if the output contains "R_ARM_THM", indicating a relocation for thumb.

How to crack intermittent bug in C Linux?

I'm running out of good ideas on how to crack this bug. I have 1000 lines of code that crashes every 2 or 3 runs. It is currently a prototype command line application written in C. An issue is that it's proprietary and I cannot give you the source, but I'd be happy to send a debug compiled executable to any brave soul on a Debian Squeeze x86_64 machine.
Here is what I got so far:
When I run it in GDB, it always complete successfully.
When I run it in Valgrind, it always complete successfully.
The issue seems to emanate from a recursive function call that is very basic. In an effort to pin point the error in this recursive function I wrote the same function in a separate application. It always completes successfully.
I built my own gcc 4.7.1 compiler, compiled my code with it and I'm still getting the same behavior.
FTped my application to another machine to eliminate the risk of HW issues and I still get the same behavior.
FTped my source code to another machine to eliminate the risk of a corrupt build environment and I still get the same behavior.
The application is single threaded and does no signal handling that might cause race conditions. I memset(,0,) all large objects
There are no exotic dependencies, the ldd follows below.
ldd gives me this:
ldd tst
linux-vdso.so.1 => (0x00007fff08bf0000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007fe8c65cd000)
libm.so.6 => /lib/libm.so.6 (0x00007fe8c634b000)
libc.so.6 => /lib/libc.so.6 (0x00007fe8c5fe8000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe8c67fc000)
Are there any tools out there that could help me?
What would be your next step if you were in my position?
Thanks!
This is what got me in the right direction -Wextra I already used -Wall.
THANKS!!! This was really driving me crazy.
I suggested in comments :
to compile with -Wall -Wextra and improve the source code till no warnings are given;
to compile with both -g and -O; this is helpful to inspect dumped core files with gdb (you may want to set a big enough coredump size limit with e.g. ulimit bash builtin)
to show your code to a colleague and explain the issue?
to use ltrace or strace
Apparently -Wextra was helpful. It would be nice to understand why and how.
BTW, for larger programs, you could even add your own warnings to GCC by extending it with MELT; this may take days and is worthwhile mostly in big projects.
In this case, i think that you have some memory problems (see the output of valgrind carefully), cause GDB and valgrind change the original program by adding some memory tracking functions (so your original addresses are changed). You can compile with -ggdb option and set coredump (ulimit -c unlimited) and then trying to analyze what's going on. This link may help you:
http://en.wikipedia.org/wiki/Unusual_software_bug
Regards.

Resources