ARM assembly using Qemu [closed] - arm

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Well i've searched whole internet for code that will run using arm-linux-gnueabi-as and qemu.
To print a integer value. From string. A routine will help.

obviously you have not searched the whole internet...because, if nothing else, the qemu source code contains all the answers to your questions...
QEMU emulates systems. Are you trying to do something bare metal on an emulated arm system? Or are you trying to run an arm linux operating system and within the operating system create a program in assembly that runs on the operating system which is running on qemu? if it is the latter it has nothing to do with qemu, it is an operating system question not a qemu question. and it is not a language question (asm) but an operating system question. asm and low level are two different things. asm does not imply low level access and is definitely not required (and rarely used) for low level stuff.
If you are not interested in an operating system but just bare metal, here is one of many ways to get serial output on the qemu console. strings and integers and such are a language-less problem (same solution can apply to any programming language, asm, c, python, etc solve the problem THEN apply the language to the problem).
start.s
.globl _start
_start:
ldr r0,=0x101f1000
mov r1,#0
loop:
add r1,r1,#1
and r1,r1,#7
add r1,r1,#0x30
str r1,[r0]
mov r2,#0x0D
str r2,[r0]
mov r2,#0x0A
str r2,[r0]
b loop
memmap
MEMORY
{
rom : ORIGIN = 0x00010000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > rom
}
Makefile
CROSS_COMPILE ?= arm-none-linux-gnueabi
AOPS = --warn --fatal-warnings
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding
hello_world.bin : startup.s memmap
$(CROSS_COMPILE)-as $(AOPS) startup.s -o startup.o
$(CROSS_COMPILE)-ld startup.o -T memmap -o hello_world.elf
$(CROSS_COMPILE)-objdump -D hello_world.elf > hello_world.list
$(CROSS_COMPILE)-objcopy hello_world.elf -O binary hello_world.bin
run with
qemu-system-arm -M versatilepb -m 128M -nographic -kernel hello_world.bin
but I dont know how to get out of the console
Instead if you do this:
qemu-system-arm -M versatilepb -m 128M -kernel hello_world.bin
and then ctrl-alt-3 (not F3 but 3) will switch to the serial console
and you can see the output, and can close out of qemu by closing the console window.
The uart tx register in the versatilepb qemu target is at address 0x101f1000. Because this is a simulation, you can "just try" and find out that writing to this address without doing any real-world uart setup "just works", and being an emulated system it probably instantly transmits the character so you dont have to wait for it to complete or poll for an empty tx buffer slot or anything like that. Just blast away. This will get you started, then you can try to do real-world stuff later if that is of interest. (other uarts in other targets may be closer to real-world like and require some initialization, and waiting for an empty tx buf).
asm makes the above more painful, just use C instead for your low level/bare metal programs.
Also if doing arm bare metal you can use arm-none-linux-gnueabi, if you know what you are doing, but will eventually find arm-none-eabi a better fit since you are not using an operating system much less linux.

Related

is there any use of __attribute__ ((interrupt)) for riscv compilers?

we can read here that the interrupt attribute keyword is use for ARM, AVR, CR16, Epiphany, M32C, M32R/D, m68k, MeP, MIPS, RL78, RX and Xstormy16.
does it have any impact on riscv compilation using riscv32-***-elf-gcc compilers?
There is a separate page for RISC-V which claims it works. You can find it here. Also you could probably verify it by compiling code with and without the attribute set.
I don't have riscv32 toolchain installed, but i managed to verify it using the riscv64 toolchain. You should reproduce the same steps using the riscv32 toolchain to make sure it works.
Using a simple test.c file:
__attribute__((interrupt))
void test() {}
Compiling it with riscv64-linux-gnu-gcc -c -o test.o test.c and disassembling with riscv64-linux-gnu-objdump -D -j.text test.o we can see it generates mret instruction at the end of the function:
0: 1141 addi sp,sp,-16
2: e422 sd s0,8(sp)
4: 0800 addi s0,sp,16
6: 0001 nop
8: 6422 ld s0,8(sp)
a: 0141 addi sp,sp,16
c: 30200073 mret
After removing the interrupt attribute the instruction changes to regular ret. According to this SO answer this seems like correct behaviour.
Normally, an interrupt handler requires a different entry/exit sequence than a normal function. The differences focus in the saving of all registers in the interrupt (normally, only some registers are preserved in a normal function call) and the return instruction is normally different (e.g. in the ARM it has to change processor mode of operation, probably this is also true in the RISCV processor)
The interrupt attribute informs the compiler of the routine properties, so it can generate the correct code for it.

How do you call C functions from Assembly and how do you link it Statically?

I am playing around and trying to understand the low-level operation of computers and programs. To that end, I am experimenting with linking Assembly and C.
I have 2 program files:
Some C code here in "callee.c":
#include <unistd.h>
void my_c_func() {
write(1, "Hello, World!\n", 14);
return;
}
I also have some GAS x86_64 Assembly here in "caller.asm":
.text
.globl my_entry_pt
my_entry_pt:
# call my c function
call my_c_func # this function has no parameters and no return data
# make the 'exit' system call
mov $60, %rax # set the syscall to the index of 'exit' (60)
mov $0, %rdi # set the single parameter, the exit code to 0 for normal exit
syscall
I can build and execute the program like this:
$ as ./caller.asm -o ./caller.obj
$ gcc -c ./callee.c -o ./callee.obj
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out -dynamic-linker /lib64/ld-linux-x86-64.so.2
$ ldd ./prog.out
linux-vdso.so.1 (0x00007fffdb8fe000)
libc.so.6 => /lib64/libc.so.6 (0x00007f46c7756000)
/lib64/ld-linux-x86-64.so.2 (0x00007f46c7942000)
$ ./prog.out
Hello, World!
Along the way, I had some problems. If I don't set the -dynamic-linker option, it defaults to this:
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
$ ldd ./prog.out
linux-vdso.so.1 (0x00007ffc771c5000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8f2abe2000)
/lib/ld64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007f8f2adce000)
$ ./prog.out
bash: ./prog.out: No such file or directory
Why is this? Is there a problem with the linker defaults on my system? How can/should I fix it?
Also, static linking doesn't work.
$ ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
ld: ./callee.obj: in function `my_c_func':
callee.c:(.text+0x16): undefined reference to `write'
Why is this? Shouldn't write() just be a c library wrapper for the syscall 'write'? How can I fix it?
Where can I find the documentation on the C function calling convention so I can read up on how parameters are passed back and forth, etc...?
Lastly, while this seems to work for this simple example, am I doing something wrong in my initialization of the C stack? I mean, right now, I'm doing nothing. Should I be allocing memory from the kernel for the stack, setting bounds, and setting %rsp and %rbp before I start trying to call functions. Or is the kernel loader taking care of all this for me? If so, will all architectures under a Linux kernel take care of it for me?
While the Linux kernel provides a syscall named write, it does not mean that you automatically get a wrapper function of the same name you can call from C as write(). In fact, you need inline assembly to call any syscalls from C, if you're not using libc, because libc defines those wrapper functions.
Instead of explicitly linking your binaries with ld, let gcc do it for you. It can even assemble assembly files (internally executing a suitable version of as), if the source ends with a .s suffix. It looks like your linking problems are simply a disagreement between what GCC assumes and how you do it via LD yourself.
No, it's not a bug; the ld default path for ld.so isn't the one used on modern x86-64 GNU/Linux systems. (/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2)
Linux uses the System V ABI. The AMD64 Architecture Processor Supplement (PDF) describes the initial execution environment (when _start gets invoked), and the calling convention. Essentially, you have an initialized stack, with environment and command-line arguments stored in it.
Let's construct a fully working example, containing both C and assembly (AT&T syntax) sources, and a final static and dynamic binaries.
First, we need a Makefile to save typing long commands:
# SPDX-License-Identifier: CC0-1.0
CC := gcc
CFLAGS := -Wall -Wextra -O2 -march=x86-64 -mtune=generic -m64 \
-ffreestanding -nostdlib -nostartfiles
LDFLAGS :=
all: static-prog dynamic-prog
clean:
rm -f static-prog dynamic-prog *.o
%.o: %.c
$(CC) $(CFLAGS) $^ -c -o $#
%.o: %.s
$(CC) $(CFLAGS) $^ -c -o $#
dynamic-prog: main.o asm.o
$(CC) $(CFLAGS) $^ $(LDFLAGS) -o $#
static-prog: main.o asm.o
$(CC) -static $(CFLAGS) $^ $(LDFLAGS) -o $#
Makefiles are particular about their indentation, but SO converts tabs to spaces. So, after pasting the above, run sed -e 's|^ *|\t|' -i Makefile to fix the indentation back to tabs.
The SPDX License Identifier in the above Makefile and all following files tell you that these files are licensed under Creative Commons Zero license: that is, these are all dedicated to public domain.
Compilation flags used:
-Wall -Wextra: Enable all warnings. It is a good practice.
-O2: Optimize the code. This is a commonly used optimization level, usually considered sufficient and not too extreme.
-march=x86-64 -mtune=generic -m64: Compile to 64-bit x86-64 AKA AMD64 architecture. These are the defaults; you can use -march=native to optimize for your own system.
-ffreestanding: Compilation targets the freestanding C environment. Tells the compiler it can't assume that strlen or memcpy or other library functions are available, so don't optimize a loop, struct copy, or array initialization into calls to strlen, memcpy, or memset, for example. If you do provide asm implementations of any functions gcc might want to invent calls to, you can leave this out. (Especially if you're writing a program that will run under an OS)
-nostdlib -nostartfiles: Do not link in the standard C library or its startup files. (Actually, -nostdlib already "includes" -nostartfiles, so -nostdlib alone would suffice.)
Next, let's create a header file, nolib.h, that implements nolib_exit() and nolib_write() wrappers around the group_exit and write syscalls:
// SPDX-License-Identifier: CC0-1.0
/* Require Linux on x86-64 */
#if !defined(__linux__) || !defined(__x86_64__)
#error "This only works on Linux on x86-64."
#endif
/* Known syscall numbers, without depending on glibc or kernel headers */
#define SYS_write 1
#define SYS_exit_group 231
// Normally you'd use
// #include <asm/unistd.h> for __NR_write and __NR_exit_group
// or even #include <sys/syscall.h> for SYS_write
/* Inline assembly macro for a single-parameter no-return syscall */
#define SYSCALL1_NORET(nr, arg1) \
__asm__ volatile ( "syscall\n\t" : : "a" (nr), "D" (arg1) : "rcx", "r11", "memory")
/* Inline assembly macro for a three-parameter syscall */
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
__asm__ volatile ( "syscall\n\t" : "=a" (retval) : "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) : "rcx", "r11", "memory" )
/* exit() function */
static inline void nolib_exit(int retval)
{
SYSCALL1_NORET(SYS_exit_group, retval);
}
/* Some errno values */
#define EINTR 4 /* Interrupted system call */
#define EBADF 9 /* Bad file descriptor */
#define EINVAL 22 /* Invalid argument */
// or #include <asm/errno.h> to define these
/* write() syscall wrapper - returns negative errno if an error occurs */
static inline long nolib_write(int fd, const void *data, long len)
{
long retval;
if (fd == -1)
return -EBADF;
if (!data || len < 0)
return -EINVAL;
SYSCALL3(retval, SYS_write, fd, data, len);
return retval;
}
The reason the nolib_exit() uses the exit_group syscall instead of the exit syscall is that exit_group ends the entire process. If you run a program under strace, you'll see it too calls exit_group syscall at the very end. (Syscall implementation of exit())
Next, we need some C code. main.c:
// SPDX-License-Identifier: CC0-1.0
#include "nolib.h"
const char *c_function(void)
{
return "C function";
}
static inline long nolib_put(const char *msg)
{
if (!msg) {
return nolib_write(1, "(null)", 6);
} else {
const char *end = msg;
while (*end)
end++; // strlen
if (end > msg)
return nolib_write(1, msg, (unsigned long)(end - msg));
else
return 0;
}
}
extern const char *asm_function(int);
void _start(void)
{
nolib_put("asm_function(0) returns '");
nolib_put(asm_function(0));
nolib_put("', and asm_function(1) returns '");
nolib_put(asm_function(1));
nolib_put("'.\n");
nolib_exit(0);
}
nolib_put() is just a wrapper around nolib_write(), that finds the end of the string to be written, and calculates the number of characters to be written based on that. If the parameter is a NULL pointer, it prints (null).
Because this is a freestanding environment, and the default name for the entry point is _start, this defines _start as a C function that never returns. (It must not ever return, because the ABI does not provide any return address; it would just crash the process. Instead, an exit-type syscall must be called at end.)
The C source declares and calls a function asm_function, that takes an integer parameter, and returns a pointer to a string. Obviously, we'll implement this in assembly.
The C source also declares a function c_function, that we can call from assembly.
Here's the assembly part, asm.s:
# SPDX-License-Identifier: CC0-1.0
.text
.section .rodata
.one:
.string "One" # includes zero terminator
.text
.p2align 4,,15
.globl asm_function #### visible to the linker
.type asm_function, #function
asm_function:
cmpl $1, %edi
jne .else
leaq .one(%rip), %rax
ret
.else:
subq $8, %rsp # 16B stack alignment for a call to C
call c_function
addq $8, %rsp
ret
.size asm_function, .-asm_function
We don't need to declare c_function as an extern because GNU as treats all unknown symbols as external symbols anyway. We could add Call Frame Information directives, at least .cfi_startproc and .cfi_endproc, but I left them out so it wouldn't be so obvious I just wrote the original code in C and let GCC compile it to assembly, and then prettified it just a bit. (Did I write that out aloud? Oops! But seriously, compiler output is often a good starting point for a hand-written asm implementation of something, unless it does a very bad job of optimizing.)
The subq $8, %rsp adjusts the stack so that it will be a multiple of 16 for the c_function. (On x86-64, stacks grow down, so to reserve 8 bytes of stack, you subtract 8 from the stack pointer.) After the call returns, addq $8, %rsp reverts the stack back to original.
With these four files, we're ready. To build the example binaries, run e.g.
reset ; make clean all
Running either ./static-prog or ./dynamic-prog will output
asm_function(0) returns 'C function', and asm_function(1) returns 'One'.
The two binaries are just 2 kB (static) and 6 kB (dynamic) in size or so, although you can make them even smaller by stripping unneeded stuff,
strip --strip-unneeded static-prog dynamic-prog
which removes about 0.5 kB to 1 kB of unneeded stuff from them – the exact amount varies depending on the version of GCC and Binutils you use.
On some other architectures, we'd need to also link against libgcc (via -lgcc), because some C features rely on internal GCC functions. 64-bit integer division (named udivdi or similar) on various architectures is a typical example.
As mentioned in the comments, the first version of the above examples had a few issues that need to be addressed. They do not stop the example from executing or working as intended, and were overlooked because the examples were written from scratch for this answer (in the hopes that others finding this question later on via web searches might find this useful), and I'm not perfect. :)
memory clobber argument to the inline assembly, in the syscall preprocessor macros
Adding "memory" in the clobbered list tells the compiler that the inline assembly may access (read and/or write) memory other than those specified in the parameter lists. It is obviously needed for the write syscall, but it is actually important for all syscalls, because the kernel can deliver e.g. signals in the same thread before returning from the syscall, and signal delivery can/will access memory.
As the GCC documentation mentions, this clobber also behaves like a read/write memory barrier for the compiler (but NOT for the processor!). In other words, with the memory clobber, the compiler knows that it must write any changes in variables etc. in memory before the inline assembly, and that unrelated variables and other memory content (not explicitly listed in the inline assembly inputs, outputs, or clobbers) may also change, and will generate the code we actually want, without making incorrect assumptions.
-fPIC -pie: Omitted for simplicity
Position independent code is usually only relevant for shared libraries. In real projects' Makefiles, you will need to use a different set of compilation flags for objects that will be compiled as a dynamic library, static library, dynamically linked executable, or a static executable, as the desired properties (and therefore compiler/linker flags) vary.
In an example such as this one, it is better to try and avoid such extraneous things, as it is a reasonable question to ask on its own ("Which compiler options to use to achieve X, when needing Y ?"), and the answers depend on the required features and context.
In most modern distros, PIE is the default and you might want -fno-pie -no-pie to simplify debugging / disassembling. 32-bit absolute addresses no longer allowed in x86-64 Linux?
-nostdlib does imply (or "include") -nostartfiles
There are quite a few overall options and link options we can use to control how the code is compiled and linked.
Many of the options GCC supports are grouped. For example, -O2 is actually shorthand for a collection of optimization features that you can explicitly specify.
Here, the reason for keeping both is to remind human programmers of the expectations for the code: no standard library, and no start files/objects.
-march=x86-64 -mtune=generic -m64 is the default on x86-64
Again, this is kept more as a reminder of what the code expects. Without a specific architecture definition, one might get the wrong impression that the code should be compilable in general, because C typically is not architecture specific!
The nolib.h header file does contain preprocessor checks (using pre-defined compiler macros to detect the operating system and hardware architecture), halting the compilation with an error for other OSes and hardware architectures.
Most Linux distributions provide the syscall numbers in <asm/unistd.h>, as __NR_name.
These are derived from the actual kernel sources. However, for any given architecture, these are the stable userspace ABI, and will not change. New ones may be added. Only in some extraordinary circumstances (unfixable security holes, perhaps?) can a syscall be deprecated and stop functioning.
It is always better to use the syscall numbers from the kernel, preferably via the aforementioned header, but it's possible to build this program with only GCC, no glibc or Linux kernel headers installed. For someone writing their own standard C library, they should include the file (from Linux kernel sources).
I do know that Debian derivatives (Ubuntu, Mint, et cetera) all do provide the <asm/unistd.h> file, but there are many, many other Linux distributions, and I just am not sure about all of them. I opted to only define the two (exit_group and write), to minimize the risk of problems.
(Editor's note: the file might be in a different place in the filesystem, but the <asm/unistd.h> include path should always work if the right header package is installed. It's part of the kernel's user-space C/asm API.)
Compilation flag -g adds debug symbols, which adds greatly when debugging – for example, when running and examining the binary in gdb.
I omitted this and all related flags, because I did not want to expand the topic any further, and because this example is easily debugged at the asm level and examined even without. See GDB asm tips like layout reg at the bottom of the x86 tag wiki
The System V ABI requires that before a call to a function, the stack is aligned to 16 bytes. So at the top of the function, RSP+-8 is 16-byte aligned, and if there are any stack args, they'll be aligned.
The call instruction pushes the current instruction pointer to the stack, and because this is a 64-bit architecture, that too is 64 bits = 8 bytes. So, to conform to the ABI, we really need to adjust the stack pointer by 8 before calling the function, to ensure it too gets a properly aligned stack pointer. These were initially omitted, but are now included in the assembly (asm.s file).
This matters, because on x86-64, SSE/AVX SIMD vectors have different instructions for aligned-to-16-bytes and unaligned accesses, with the aligned accesses being significantly faster or certain processors. (Why does System V / AMD64 ABI mandate a 16 byte stack alignment?). Using aligned SIMD instructions like movaps with unaligned addresses will cause the process to crash. (e.g. glibc scanf Segmentation faults when called from a function that doesn't align RSP is a real-life example of what happens when you get this wrong.)
However, when we do such stack manipulations, we really should add CFI (Call Frame Information) directives to ensure debugging and stack unwinding etc. works correctly. In this case, for general CFI, we prepend .cfi_startproc before the first instruction in an assembly function, and .cfi_endproc after the last instruction in an assembly function. For the Canonical Frame Address, CFA, we add .cfi_def_cfa_offset N after any instruction that modifies the stack pointer. Essentially, N is 8 at the beginning of the function, and increases as much as %rsp is decremented, and vice versa. See this article for more.
Internally, these directives produce information (metadata) stored in the .eh_frame and .eh_frame_hdr sections in the ELF object files and binaries, depending on other compilation flags.
So, in this case, the subq $8, %rsp should be followed by .cfi_def_cfa_offset 16, and the addq $8, %rsp by .cfi_def_cfa_offset 8, plus .cfi_startproc at the beginning of asm_function and .cfi_endproc after the final ret.
Note that you can often see rep ret instead of just rep in assembly sources. This is nothing but a workaround to certain processors having branch-prediction performance issues when jumping to or falling through a JCC to a ret instruction. The rep prefix does nothing, except it does fix the issues those processors might otherwise have with such a jump. Recent GCC versions stopped doing this by default as the affected AMD CPUs are very old and not as relevant these days. What does `rep ret` mean?
The "key" option, -ffreestanding, is one that chooses a C "dialect"
The C programming language is actually separated into two different environments: hosted, and freestanding.
The hosted environment is one where the standard C library is available, and is used when you write programs, applications, or daemons in C.
The freestanding environment is one where the standard C library is not available. It is used when you write kernels, firmware for microcontrollers or embedded systems, implement (parts of) your own standard C library, or a "standard library" for some other C-derived language.
As an example, the Arduino programming environment is based on a subset of freestanding C++. The standard C++ library is not available, and many features of C++ like exceptions are not supported. In fact, it is very close to freestanding C with classes. The environment also uses a special pre-preprocessor, which for example automatically prepends declarations of functions without the user having to write them.
Probably the most well known example of freestanding C is the Linux kernel. Not only is the standard C library not available, but the kernel code must actually avoid floating-point operations as well, because of certain hardware considerations.
For a better understanding of what exactly does the freestanding C environment look like to a programmer, I think the best thing is to go look at the language standard itself. As of now (June 2020), the most recent standard is ISO C18. While the standard itself is not free, the final draft is; for C18, it is draft N2176(PDF).
The ld default path for ld.so (the ELF interpreter) isn't the one used on modern x86-64 GNU/Linux systems.
/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2.
There was never a good time to update the default in GNU binutils ld; when some systems were using the default, changing it would have broken them. Multi-arch systems had to configure their GCC to pass -dynamic-linker /some/path to ld, so they simply did that instead of asking and waiting for the ld default to change. So nobody ever needed the ld default to change to make anything work, except for people playing around with assembly and using ld by hand to create dynamically-linked executables.
Instead of doing that, you can link using gcc -nostartfiles to omit CRT start code which defines a _start, but still link with the normal libraries including -lc, -lgcc internal helper functions if needed, etc.
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) for more info on assembling with/without libc for asm that defines _start, or with libc + CRT for asm that defines main. (Leave out the -m32 from that answer for 64-bit; when using gcc to invoke as and ld for you, that's the only difference.)
ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
doesn't link because you put -lc before the object files that reference symbols in libc.
Order matters in linker command lines, for static libraries.
However, ld -static -e my_entry_pt ./callee.o ./caller.o -lc -o ./prog.out will link, but makes a program that segfaults when it calls glibc functions like write without having called glibc's init functions.
Dynamic linking takes care of that for you (glibc has .init functions that get called by the dynamic linker, the same mechanism that allows C++ static initializers to run in a C++ shared library). CRT startup code also calls those functions in the right order, but you left that out, too, and wrote your own entry point.
#Example's answer avoids that problem by defining its own write wrapper instead of linking with -lc, so it can be truly freestanding.
I thought glibc's write wrapper function would be simple enough not to crash, but that's not the case. It checks if the program is multi-threaded or something by loading from %fs:0x18. The kernel doesn't init FS base for thread-local storage; that's something user-space (glibc's internal init functions) would have to do.
glibc's write() faults on mov %fs:0x18,%eax if you haven't called glibc's init functions. (In a statically-linked executable where glibc couldn't get the dynamic linker to run them for you.)
Dump of assembler code for function write:
=> 0x0000000000401040 <+0>: endbr64 # for CET, or NOP on CPUs without CET
0x0000000000401044 <+4>: mov %fs:0x18,%eax ### this faults with no TLS setup
0x000000000040104c <+12>: test %eax,%eax
0x000000000040104e <+14>: jne 0x401060 <write+32>
0x0000000000401050 <+16>: mov $0x1,%eax # simple case: EAX = __NR_write
0x0000000000401055 <+21>: syscall
0x0000000000401057 <+23>: cmp $0xfffffffffffff000,%rax
0x000000000040105d <+29>: ja 0x4010b0 <write+112> # update errno on error
0x000000000040105f <+31>: retq # else return
0x0000000000401060 <+32>: sub $0x28,%rsp # the non-simple case:
0x0000000000401064 <+36>: mov %rdx,0x18(%rsp) # write is an async cancellation point or something
0x0000000000401069 <+41>: mov %rsi,0x10(%rsp)
0x000000000040106e <+46>: mov %edi,0x8(%rsp)
0x0000000000401072 <+50>: callq 0x4010e0 <__libc_enable_asynccancel>
0x0000000000401077 <+55>: mov 0x18(%rsp),%rdx
0x000000000040107c <+60>: mov 0x10(%rsp),%rsi
0x0000000000401081 <+65>: mov %eax,%r8d
0x0000000000401084 <+68>: mov 0x8(%rsp),%edi
0x0000000000401088 <+72>: mov $0x1,%eax
0x000000000040108d <+77>: syscall
0x000000000040108f <+79>: cmp $0xfffffffffffff000,%rax
0x0000000000401095 <+85>: ja 0x4010c4 <write+132>
0x0000000000401097 <+87>: mov %r8d,%edi
0x000000000040109a <+90>: mov %rax,0x8(%rsp)
0x000000000040109f <+95>: callq 0x401140 <__libc_disable_asynccancel>
0x00000000004010a4 <+100>: mov 0x8(%rsp),%rax
0x00000000004010a9 <+105>: add $0x28,%rsp
0x00000000004010ad <+109>: retq
0x00000000004010ae <+110>: xchg %ax,%ax
0x00000000004010b0 <+112>: mov $0xfffffffffffffffc,%rdx # errno update for the simple case
0x00000000004010b7 <+119>: neg %eax
0x00000000004010b9 <+121>: mov %eax,%fs:(%rdx) # thread-local errno?
0x00000000004010bc <+124>: mov $0xffffffffffffffff,%rax
0x00000000004010c3 <+131>: retq
0x00000000004010c4 <+132>: mov $0xfffffffffffffffc,%rdx # same for the async case
0x00000000004010cb <+139>: neg %eax
0x00000000004010cd <+141>: mov %eax,%fs:(%rdx)
0x00000000004010d0 <+144>: mov $0xffffffffffffffff,%rax
0x00000000004010d7 <+151>: jmp 0x401097 <write+87>
I don't fully understand what exactly write is checking for or doing. It may have something to do with async I/O, and/or POSIX thread cancellation points.

How do I use the GNU linker instead of the Darwin Linker?

I'm running OS X 10.12 and I'm developing a basic text-based operating system. I have developed a boot loader and that seems to be running fine. My only problem is that when I attempt to compile my kernel into pure binary, the linker won't work. I have done some research and I think that this is because of the fact OS X runs the Darwin linker and not the GNU linker. Because of this, I have downloaded and installed the GNU binutils. However, it still won't work...
Here is my kernel:
void main() {
// Create pointer to a character and point it to the first cell of video
// memory (i.e. the top-left)
char* video_memory = (char*) 0xb8000;
// At that address, put an x
*video_memory = 'x';
}
And this is when I attempt to compile it:
Hazims-MacBook-Pro:32 bit root# gcc -ffreestanding -c kernel.c -o kernel.o
Hazims-MacBook-Pro:32 bit root# ld -o kernel.bin -T text 0x1000 kernel.o --oformat binary
ld: unknown option: -T
Hazims-MacBook-Pro:32 bit root#
I would love to know how to solve this issue. Thank you for your time.
-T is a gcc compiler flag, not a linker flag. Have a look at this:
With these components you can now actually build the final kernel. We use the compiler as the linker as it allows it greater control over the link process. Note that if your kernel is written in C++, you should use the C++ compiler instead.
You can then link your kernel using:
i686-elf-gcc -T linker.ld -o myos.bin -ffreestanding -O2 -nostdlib boot.o kernel.o -lgcc
Note: Some tutorials suggest linking with i686-elf-ld rather than the compiler, however this prevents the compiler from performing various tasks during linking.
The file myos.bin is now your kernel (all other files are no longer needed). Note that we are linking against libgcc, which implements various runtime routines that your cross-compiler depends on. Leaving it out will give you problems in the future. If you did not build and install libgcc as part of your cross-compiler, you should go back now and build a cross-compiler with libgcc. The compiler depends on this library and will use it regardless of whether you provide it or not.
This is all taken directly from OSDev, which documents the entire process, including a bare-bones kernel, very clearly.
You're correct in that you probably want binutils for this especially if you're coding baremetal; while clang as is purports to be a cross compiler it's far from optimal or usable here, for various reasons. noticing you're developing on ARM I infer; you want this.
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm
Aside from the fact that gcc does this thing better than clang markedly, there's also the issue that ld does not build on OS X from the binutils package; it in some configurations silently fails so you may in fact never have actually installed it despite watching libiberty etc build, it will even go through the motions of compiling the source of that target sometimes and just refuse to link it... to the fellow with the lousy tone blaming OP, if you had relevant experience ie ever had built this under this condition you would know that is patently obnoxious. it'd be nice if you'd refrain from discouraging people from asking legitimate questions.
In the CXXfilt package they mumble about apple-darwin not being a target; try changing FAKE_TARGET to instead of mn10003000-whatever or whatever they used, to apple-rhapsody some time.
You're still in way better shape just building them from current if you say need to strip relocations from something or want to work on restoring static linkage to the system. which is missing by default from that clang installation as well...anyhow it's not really that ld couldn't work with macho, it's all there, codewise in fact...that i am sure of
Regarding locating things in memory, you may want to refer to a linker script
http://svn.screwjackllc.com/?p=noid.git;a=blob_plain;f=new_mbed_bs.link_script.ld
As i have some code in there that will directly place things in memory, rather than doing it on command line it is more reproducible to go with the linker script. it's a little complex but what it is doing is setting up a couple of regions of memory to be used with my memory allocators, you can use malloc, but you should prefer not to use actual malloc; dynamic memory is fine when it isn't dynamic...heh...
The script also sets flags for the stack and heap locations, although they are just markers, not loaded til go time, they actually get placed, stack and heap, by the startup code, which is in assembly and rather readable and well commented (hard to believe, i know)... neat trick, you have some persistence to volatile memory, so i set aside a very tiny bit to flip and you can do things like have it control what bootloader to run on the next power cycle. again you are 100% correct regarding the linker; seems to be you are headed the right direction. incidentally another way you can modify objects prior to loading them , and preload things in memory, similar to this method, well there are a ton of ways, but, check out objcopy and objdump...you can use gdb to dump srecs of structures in memory, note the address, and then before linking but after assembly use dd to insert the records you extracted with gdb back in to extracted sections..is one of my favorite ways just because is smartass route :D also, if you are tight on memory ever and need to precalculate constants it's one way to optimize things...that way is actually closer to what ld is doing, just doing it by hand... probably path of least resistance on this now though is linker script.

using library in bare metal program for arm

Can someone help me out please! I do not know if the answer is general, or specific to the board and software versions I am working with. I am out of my previous areas here, and do not even know the right question to ask.
EDITs added at the bottom
What I currently want, is to create a program that will run standalone (bare metal; no OS) on a A20-OLinuXino-Micro-4GB board, that needs to use (at least) some standard math library calls. Eventually, I will want to load it into NAND, and run it on powerup, but for now I am trying to manually load it (loady) from the U-Boot (github.com/linux-sunxi/u-boot-sunxi/wiki) serial 'console', after booting from an SD card. Standalone is needed, because the linux distro level access to the hardware GPIO ports is not very flexible, when working with more than one bit (port in a port group) at a time, and quite slow. Too slow for the target application, and I did not really want to try modifying / adding a kernel module just to see if that would be fast enough.
Are there some standard gcc / ld flags needed to create a bare metal standalone program, and include some library routines? Beyond -ffreestanding and -static? Is there some special glue code needed? Is there something else I have not even thought of?
If found and looked over Beagleboard bare metal programming (stackoverflow.com/questions/6870712/beagleboard-bare-metal-programming). The answer there is good info, but is assembler, and does not reference any library. Application hangs when calling printf to uart with bare metal raspberry pi might show a cause for the problem. The (currently) bottom answer points to problems with VFP, and I already ran across problems with soft/hard floating point options. That shows some assembler code, but I am missing details about how to add a wrapper/glue to combine with c code. My assembler coding is rusty, but would adding equivalent code at the start of hello_world (at least before the reference to the sin() function (likely) get things working? Maybe adding it into the libstubs code.
I am using another A20 board for the main development environment.
$ gcc --version gcc (Debian 4.6.3-14) 4.6.3 Copyright (C) 2011 Free
Software Foundation, Inc. This is free software; see the source for
copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld.bfd --version GNU ld (GNU Binutils for Debian) 2.22 Copyright
2011 Free Software Foundation, Inc. This program is free software; you
may redistribute it under the terms of the GNU General Public License
version 3 or (at your option) a later version. This program has
absolutely no warranty.
$ uname -a Linux a20-OLinuXino 3.4.67+ #6 SMP PREEMPT Fri Nov 1
17:32:40 EET 2013 armv7l GNU/Linux
I have been able to create bootable U-Boot images for the board on SD cards from the repo, either building directly from the linux-sunxi distro that was supplied with the board, or by cross-compiling from a Fedora 21 machine. Same for the standalone hello_world program that came in the examples for U-boot, which can be loaded and run from the U-Boot console.
However, reducing the sample program to bare minimum, then adding code that needs math.h, -lm and -lc fails (in various iterations) with 'software interrupt' or 'undefined operation' type errors. The original sample program was being linked with -lgcc, but a little checking showed that nothing was actually being included from the library. The identical binary was created without the library, so the question might be 'what does it take to use any library with a bare metal program?'
sun7i# go 0x48000000
## Starting application at 0x48000000 ...
Hello math World
undefined instruction
pc : [<48000010>] lr : [<4800000c>]
sp : 7fb66da0 ip : 7fb672c0 fp : 00000000
r10: 00000002 r9 : 7fb66f0c r8 : 7fb67778
r7 : 7ffbbaf8 r6 : 00000001 r5 : 7fb6777c r4 : 48000000
r3 : 00000083 r2 : 7ffbc7fc r1 : 0000000a r0 : 00000011
Flags: nZCv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
To get that far, I had to tweak build options, to specify hardware floating point, since that is how the base libraries were compiled.
Here are the corresponding source and build script files
hello_world.c
#include <common.h>
#include <math.h>
int hello_world (void)
{
double tst;
tst = 0.33333333333;
printf ("Hello math World\n");
tst = sin(0.5);
// printf ("sin test %d : %d\n", (int)tst, (int)(1000 * tst));
return (0);
}
build script
#! /bin/bash
UBOOT="/home/olimex/u-boot-sunxi"
SRC="$UBOOT/examples/standalone"
#INCLS="-nostdinc -isystem /usr/lib/gcc/arm-linux-gnueabihf/4.6/include -I$UBOOT/include -I$UBOOT/arch/arm/include"
INCLS="-I$UBOOT/include -I$UBOOT/arch/arm/include"
#-v
GCCOPTS="\
-D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x4a000000\
-Wall -Wstrict-prototypes -Wno-format-security\
-fno-builtin -ffreestanding -Os -fno-stack-protector\
-g -fstack-usage -Wno-format-nonliteral -fno-toplevel-reorder\
-DCONFIG_ARM -D__ARM__ -marm -mno-thumb-interwork\
-mabi=aapcs-linux -mword-relocations -march=armv7-a\
-ffunction-sections -fdata-sections -fno-common -ffixed-r9\
-mhard-float -pipe"
# -msoft-float -pipe
OBJS="hello_world.o libstubs.o"
LDOPTS="--verbose -g -Ttext 0x48000000"
#--verbose
#LIBS="-static -L/usr/lib/gcc/arm-linux-gnueabihf/4.6 -lm -lc"
LIBS="-static -lm -lc"
#-lgcc
gcc -Wp,-MD,stubs.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(stubs)"\
-D"KBUILD_MODNAME=KBUILD_STR(stubs)"\
-c -o stubs.o $SRC/stubs.c
ld.bfd -r -o libstubs.o stubs.o
gcc -Wp,-MD,hello_world.o.d $INCLS $GCCOPTS -D"KBUILD_STR(s)=#s"\
-D"KBUILD_BASENAME=KBUILD_STR(hello_world)"\
-D"KBUILD_MODNAME=KBUILD_STR(hello_world)"\
-c -o hello_world.o hello_world.c
ld.bfd $LDOPTS -o hello_world -e hello_world $OBJS $LIBS
objcopy -O binary hello_world hello_world.bin
EDITS added:
The application that this is to be part of needs both some fairly high speed GPIO and some math functions. Should only need sin() and maybe sqrt(). My previous testing for the GPIO got the toggling of single pin (port in a port group) up to 8MHz. The constraints for the application need to get the full cycle time in the 10µs (100Hhz) range, which includes reading all pins from a single port, and writing a few pins on other ports, synchronized with the timing limitations of the attached ADC chip (3 ADC reads). I have bare metal code that is doing (simulating) that process in about 2.1µs. Now I need to add in the math to process the values, the output of which will set some more outputs. Future planned improvements including using SIMD for the math, and dedicating the second core to the math, while the first does the GPIO and 'feeds' the calculations.
The needed math code / logic has already been written into a simulation program using very standard (c99) code. I just need to port it into the bare metal program. Need to get 'math' to work first.
As first thing, I suggest reading this excellent paper on Bare Metal programming with ARM and GNU http://www.state-machine.com/arm/Building_bare-metal_ARM_with_GNU.pdf.
Then, I would make sure you avoid any syscall to the Linux Kernel (which you don't have and your compiler will try to make), e.g. avoiding returning values in void main() - that should never return, anyway.
Finally, I would either user newlib or, if you need to use a small subset of what libraries have to offer you, write a custom implementation.
Keep in mind you are using an Allinner SoC which is not the best for bare metal documentation, but you can find the TRM here http://www.soselectronic.com/a_info/resource/c/20_UM-V1.020130322.pdf, so I would check if libraries (if you decide to use them) or your code need some special silicon hardware to be initialized (some interconnect fabric, clock and power domains, etc.).
I strongly suggest, if you just need to use sin() and similar, to just deploy your own.

Script/Tool predicate for ARM ELF compiled for Thumb OR Arm

I have rootfs and klibc file systems. I am creating make rules and some developers have an older compiler without inter-networking.note1 I am trying to verify that all the files get built with arm only when a certain version of the compiler is detected. I have re-built the tree's several times. I was using readelf -A and looking for Tag_THUMB_ISA_use: Thumb-1, but this seem to be in arm only code (but was built with the interworking compiler) as well as thumb code. I can manually run objdump -S and examine the assembler to determine what instruction set is in use.
However, it would be much easier if I had a script/tool predicate so that find, etc can be used to search through the shadow file systems to look for binaries that may have been missed. I thought that some of this information would be in the ELF header and accessible via objdump or readelf, but I haven't found anything reliable.
Specifically I am looking for,
Compiled 'C' that wouldn't run without a CONFIG_ARM_THUMB Linux system.
make rules that use 'C' compiler flags that choke a non-thumb compilers.
note1: Interworking allow easy switching between thumb and arm modes, and the compiler will automatically generate code to support calling from either mode.
The readelf -A output doesn't describe the elf contents. It just describes the capabilities of the processor and or system that is expected or fed to the compiler. As I have an ARM926 CPU which is an ARMV5TEJ processor, gcc/ld will always set Tag_THUMB_ISA_use: Thumb-1 as it just means that ARMV5TEJ is recognized as being Thumb-1 capable. It says nothing about the code itself.
Examining the Linux arch/arm/kernel/elf.c routine elf_check_arch() shows a check for x->e_entry & 1. This leads to the following script,
readelf -h $1 | grep -q Entry.*[13579bdf]$
Ie, just look at the initial ELF entry value and see if the low bit is set. This is a fast check that fits the spirit of what I am looking for. unixsmurf has a good point that the code inside any ELF can mix and match ARM and Thumb. This maybe ok, if the program dynamically ids the CPU and selects an appropriate routine. Ie, just the presence of a Thumb instruction doesn't mean that code will execute.
Just looking at the entry value does determine which gcc compiler flags were used, at least for gcc versions 4.6 to 4.7.
Since thumb and arm sequences can be freely interchanged within an object file, even within the same section, plain ELF header inspection is not going to help you whether a file includes Thumb instructions or not.
A slightly roundabout and still not 100% foolproof way would be to use readelf -r and check if the output contains "R_ARM_THM", indicating a relocation for thumb.

Resources