I am playing around and trying to understand the low-level operation of computers and programs. To that end, I am experimenting with linking Assembly and C.
I have 2 program files:
Some C code here in "callee.c":
#include <unistd.h>
void my_c_func() {
write(1, "Hello, World!\n", 14);
return;
}
I also have some GAS x86_64 Assembly here in "caller.asm":
.text
.globl my_entry_pt
my_entry_pt:
# call my c function
call my_c_func # this function has no parameters and no return data
# make the 'exit' system call
mov $60, %rax # set the syscall to the index of 'exit' (60)
mov $0, %rdi # set the single parameter, the exit code to 0 for normal exit
syscall
I can build and execute the program like this:
$ as ./caller.asm -o ./caller.obj
$ gcc -c ./callee.c -o ./callee.obj
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out -dynamic-linker /lib64/ld-linux-x86-64.so.2
$ ldd ./prog.out
linux-vdso.so.1 (0x00007fffdb8fe000)
libc.so.6 => /lib64/libc.so.6 (0x00007f46c7756000)
/lib64/ld-linux-x86-64.so.2 (0x00007f46c7942000)
$ ./prog.out
Hello, World!
Along the way, I had some problems. If I don't set the -dynamic-linker option, it defaults to this:
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
$ ldd ./prog.out
linux-vdso.so.1 (0x00007ffc771c5000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8f2abe2000)
/lib/ld64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007f8f2adce000)
$ ./prog.out
bash: ./prog.out: No such file or directory
Why is this? Is there a problem with the linker defaults on my system? How can/should I fix it?
Also, static linking doesn't work.
$ ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
ld: ./callee.obj: in function `my_c_func':
callee.c:(.text+0x16): undefined reference to `write'
Why is this? Shouldn't write() just be a c library wrapper for the syscall 'write'? How can I fix it?
Where can I find the documentation on the C function calling convention so I can read up on how parameters are passed back and forth, etc...?
Lastly, while this seems to work for this simple example, am I doing something wrong in my initialization of the C stack? I mean, right now, I'm doing nothing. Should I be allocing memory from the kernel for the stack, setting bounds, and setting %rsp and %rbp before I start trying to call functions. Or is the kernel loader taking care of all this for me? If so, will all architectures under a Linux kernel take care of it for me?
While the Linux kernel provides a syscall named write, it does not mean that you automatically get a wrapper function of the same name you can call from C as write(). In fact, you need inline assembly to call any syscalls from C, if you're not using libc, because libc defines those wrapper functions.
Instead of explicitly linking your binaries with ld, let gcc do it for you. It can even assemble assembly files (internally executing a suitable version of as), if the source ends with a .s suffix. It looks like your linking problems are simply a disagreement between what GCC assumes and how you do it via LD yourself.
No, it's not a bug; the ld default path for ld.so isn't the one used on modern x86-64 GNU/Linux systems. (/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2)
Linux uses the System V ABI. The AMD64 Architecture Processor Supplement (PDF) describes the initial execution environment (when _start gets invoked), and the calling convention. Essentially, you have an initialized stack, with environment and command-line arguments stored in it.
Let's construct a fully working example, containing both C and assembly (AT&T syntax) sources, and a final static and dynamic binaries.
First, we need a Makefile to save typing long commands:
# SPDX-License-Identifier: CC0-1.0
CC := gcc
CFLAGS := -Wall -Wextra -O2 -march=x86-64 -mtune=generic -m64 \
-ffreestanding -nostdlib -nostartfiles
LDFLAGS :=
all: static-prog dynamic-prog
clean:
rm -f static-prog dynamic-prog *.o
%.o: %.c
$(CC) $(CFLAGS) $^ -c -o $#
%.o: %.s
$(CC) $(CFLAGS) $^ -c -o $#
dynamic-prog: main.o asm.o
$(CC) $(CFLAGS) $^ $(LDFLAGS) -o $#
static-prog: main.o asm.o
$(CC) -static $(CFLAGS) $^ $(LDFLAGS) -o $#
Makefiles are particular about their indentation, but SO converts tabs to spaces. So, after pasting the above, run sed -e 's|^ *|\t|' -i Makefile to fix the indentation back to tabs.
The SPDX License Identifier in the above Makefile and all following files tell you that these files are licensed under Creative Commons Zero license: that is, these are all dedicated to public domain.
Compilation flags used:
-Wall -Wextra: Enable all warnings. It is a good practice.
-O2: Optimize the code. This is a commonly used optimization level, usually considered sufficient and not too extreme.
-march=x86-64 -mtune=generic -m64: Compile to 64-bit x86-64 AKA AMD64 architecture. These are the defaults; you can use -march=native to optimize for your own system.
-ffreestanding: Compilation targets the freestanding C environment. Tells the compiler it can't assume that strlen or memcpy or other library functions are available, so don't optimize a loop, struct copy, or array initialization into calls to strlen, memcpy, or memset, for example. If you do provide asm implementations of any functions gcc might want to invent calls to, you can leave this out. (Especially if you're writing a program that will run under an OS)
-nostdlib -nostartfiles: Do not link in the standard C library or its startup files. (Actually, -nostdlib already "includes" -nostartfiles, so -nostdlib alone would suffice.)
Next, let's create a header file, nolib.h, that implements nolib_exit() and nolib_write() wrappers around the group_exit and write syscalls:
// SPDX-License-Identifier: CC0-1.0
/* Require Linux on x86-64 */
#if !defined(__linux__) || !defined(__x86_64__)
#error "This only works on Linux on x86-64."
#endif
/* Known syscall numbers, without depending on glibc or kernel headers */
#define SYS_write 1
#define SYS_exit_group 231
// Normally you'd use
// #include <asm/unistd.h> for __NR_write and __NR_exit_group
// or even #include <sys/syscall.h> for SYS_write
/* Inline assembly macro for a single-parameter no-return syscall */
#define SYSCALL1_NORET(nr, arg1) \
__asm__ volatile ( "syscall\n\t" : : "a" (nr), "D" (arg1) : "rcx", "r11", "memory")
/* Inline assembly macro for a three-parameter syscall */
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
__asm__ volatile ( "syscall\n\t" : "=a" (retval) : "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) : "rcx", "r11", "memory" )
/* exit() function */
static inline void nolib_exit(int retval)
{
SYSCALL1_NORET(SYS_exit_group, retval);
}
/* Some errno values */
#define EINTR 4 /* Interrupted system call */
#define EBADF 9 /* Bad file descriptor */
#define EINVAL 22 /* Invalid argument */
// or #include <asm/errno.h> to define these
/* write() syscall wrapper - returns negative errno if an error occurs */
static inline long nolib_write(int fd, const void *data, long len)
{
long retval;
if (fd == -1)
return -EBADF;
if (!data || len < 0)
return -EINVAL;
SYSCALL3(retval, SYS_write, fd, data, len);
return retval;
}
The reason the nolib_exit() uses the exit_group syscall instead of the exit syscall is that exit_group ends the entire process. If you run a program under strace, you'll see it too calls exit_group syscall at the very end. (Syscall implementation of exit())
Next, we need some C code. main.c:
// SPDX-License-Identifier: CC0-1.0
#include "nolib.h"
const char *c_function(void)
{
return "C function";
}
static inline long nolib_put(const char *msg)
{
if (!msg) {
return nolib_write(1, "(null)", 6);
} else {
const char *end = msg;
while (*end)
end++; // strlen
if (end > msg)
return nolib_write(1, msg, (unsigned long)(end - msg));
else
return 0;
}
}
extern const char *asm_function(int);
void _start(void)
{
nolib_put("asm_function(0) returns '");
nolib_put(asm_function(0));
nolib_put("', and asm_function(1) returns '");
nolib_put(asm_function(1));
nolib_put("'.\n");
nolib_exit(0);
}
nolib_put() is just a wrapper around nolib_write(), that finds the end of the string to be written, and calculates the number of characters to be written based on that. If the parameter is a NULL pointer, it prints (null).
Because this is a freestanding environment, and the default name for the entry point is _start, this defines _start as a C function that never returns. (It must not ever return, because the ABI does not provide any return address; it would just crash the process. Instead, an exit-type syscall must be called at end.)
The C source declares and calls a function asm_function, that takes an integer parameter, and returns a pointer to a string. Obviously, we'll implement this in assembly.
The C source also declares a function c_function, that we can call from assembly.
Here's the assembly part, asm.s:
# SPDX-License-Identifier: CC0-1.0
.text
.section .rodata
.one:
.string "One" # includes zero terminator
.text
.p2align 4,,15
.globl asm_function #### visible to the linker
.type asm_function, #function
asm_function:
cmpl $1, %edi
jne .else
leaq .one(%rip), %rax
ret
.else:
subq $8, %rsp # 16B stack alignment for a call to C
call c_function
addq $8, %rsp
ret
.size asm_function, .-asm_function
We don't need to declare c_function as an extern because GNU as treats all unknown symbols as external symbols anyway. We could add Call Frame Information directives, at least .cfi_startproc and .cfi_endproc, but I left them out so it wouldn't be so obvious I just wrote the original code in C and let GCC compile it to assembly, and then prettified it just a bit. (Did I write that out aloud? Oops! But seriously, compiler output is often a good starting point for a hand-written asm implementation of something, unless it does a very bad job of optimizing.)
The subq $8, %rsp adjusts the stack so that it will be a multiple of 16 for the c_function. (On x86-64, stacks grow down, so to reserve 8 bytes of stack, you subtract 8 from the stack pointer.) After the call returns, addq $8, %rsp reverts the stack back to original.
With these four files, we're ready. To build the example binaries, run e.g.
reset ; make clean all
Running either ./static-prog or ./dynamic-prog will output
asm_function(0) returns 'C function', and asm_function(1) returns 'One'.
The two binaries are just 2 kB (static) and 6 kB (dynamic) in size or so, although you can make them even smaller by stripping unneeded stuff,
strip --strip-unneeded static-prog dynamic-prog
which removes about 0.5 kB to 1 kB of unneeded stuff from them – the exact amount varies depending on the version of GCC and Binutils you use.
On some other architectures, we'd need to also link against libgcc (via -lgcc), because some C features rely on internal GCC functions. 64-bit integer division (named udivdi or similar) on various architectures is a typical example.
As mentioned in the comments, the first version of the above examples had a few issues that need to be addressed. They do not stop the example from executing or working as intended, and were overlooked because the examples were written from scratch for this answer (in the hopes that others finding this question later on via web searches might find this useful), and I'm not perfect. :)
memory clobber argument to the inline assembly, in the syscall preprocessor macros
Adding "memory" in the clobbered list tells the compiler that the inline assembly may access (read and/or write) memory other than those specified in the parameter lists. It is obviously needed for the write syscall, but it is actually important for all syscalls, because the kernel can deliver e.g. signals in the same thread before returning from the syscall, and signal delivery can/will access memory.
As the GCC documentation mentions, this clobber also behaves like a read/write memory barrier for the compiler (but NOT for the processor!). In other words, with the memory clobber, the compiler knows that it must write any changes in variables etc. in memory before the inline assembly, and that unrelated variables and other memory content (not explicitly listed in the inline assembly inputs, outputs, or clobbers) may also change, and will generate the code we actually want, without making incorrect assumptions.
-fPIC -pie: Omitted for simplicity
Position independent code is usually only relevant for shared libraries. In real projects' Makefiles, you will need to use a different set of compilation flags for objects that will be compiled as a dynamic library, static library, dynamically linked executable, or a static executable, as the desired properties (and therefore compiler/linker flags) vary.
In an example such as this one, it is better to try and avoid such extraneous things, as it is a reasonable question to ask on its own ("Which compiler options to use to achieve X, when needing Y ?"), and the answers depend on the required features and context.
In most modern distros, PIE is the default and you might want -fno-pie -no-pie to simplify debugging / disassembling. 32-bit absolute addresses no longer allowed in x86-64 Linux?
-nostdlib does imply (or "include") -nostartfiles
There are quite a few overall options and link options we can use to control how the code is compiled and linked.
Many of the options GCC supports are grouped. For example, -O2 is actually shorthand for a collection of optimization features that you can explicitly specify.
Here, the reason for keeping both is to remind human programmers of the expectations for the code: no standard library, and no start files/objects.
-march=x86-64 -mtune=generic -m64 is the default on x86-64
Again, this is kept more as a reminder of what the code expects. Without a specific architecture definition, one might get the wrong impression that the code should be compilable in general, because C typically is not architecture specific!
The nolib.h header file does contain preprocessor checks (using pre-defined compiler macros to detect the operating system and hardware architecture), halting the compilation with an error for other OSes and hardware architectures.
Most Linux distributions provide the syscall numbers in <asm/unistd.h>, as __NR_name.
These are derived from the actual kernel sources. However, for any given architecture, these are the stable userspace ABI, and will not change. New ones may be added. Only in some extraordinary circumstances (unfixable security holes, perhaps?) can a syscall be deprecated and stop functioning.
It is always better to use the syscall numbers from the kernel, preferably via the aforementioned header, but it's possible to build this program with only GCC, no glibc or Linux kernel headers installed. For someone writing their own standard C library, they should include the file (from Linux kernel sources).
I do know that Debian derivatives (Ubuntu, Mint, et cetera) all do provide the <asm/unistd.h> file, but there are many, many other Linux distributions, and I just am not sure about all of them. I opted to only define the two (exit_group and write), to minimize the risk of problems.
(Editor's note: the file might be in a different place in the filesystem, but the <asm/unistd.h> include path should always work if the right header package is installed. It's part of the kernel's user-space C/asm API.)
Compilation flag -g adds debug symbols, which adds greatly when debugging – for example, when running and examining the binary in gdb.
I omitted this and all related flags, because I did not want to expand the topic any further, and because this example is easily debugged at the asm level and examined even without. See GDB asm tips like layout reg at the bottom of the x86 tag wiki
The System V ABI requires that before a call to a function, the stack is aligned to 16 bytes. So at the top of the function, RSP+-8 is 16-byte aligned, and if there are any stack args, they'll be aligned.
The call instruction pushes the current instruction pointer to the stack, and because this is a 64-bit architecture, that too is 64 bits = 8 bytes. So, to conform to the ABI, we really need to adjust the stack pointer by 8 before calling the function, to ensure it too gets a properly aligned stack pointer. These were initially omitted, but are now included in the assembly (asm.s file).
This matters, because on x86-64, SSE/AVX SIMD vectors have different instructions for aligned-to-16-bytes and unaligned accesses, with the aligned accesses being significantly faster or certain processors. (Why does System V / AMD64 ABI mandate a 16 byte stack alignment?). Using aligned SIMD instructions like movaps with unaligned addresses will cause the process to crash. (e.g. glibc scanf Segmentation faults when called from a function that doesn't align RSP is a real-life example of what happens when you get this wrong.)
However, when we do such stack manipulations, we really should add CFI (Call Frame Information) directives to ensure debugging and stack unwinding etc. works correctly. In this case, for general CFI, we prepend .cfi_startproc before the first instruction in an assembly function, and .cfi_endproc after the last instruction in an assembly function. For the Canonical Frame Address, CFA, we add .cfi_def_cfa_offset N after any instruction that modifies the stack pointer. Essentially, N is 8 at the beginning of the function, and increases as much as %rsp is decremented, and vice versa. See this article for more.
Internally, these directives produce information (metadata) stored in the .eh_frame and .eh_frame_hdr sections in the ELF object files and binaries, depending on other compilation flags.
So, in this case, the subq $8, %rsp should be followed by .cfi_def_cfa_offset 16, and the addq $8, %rsp by .cfi_def_cfa_offset 8, plus .cfi_startproc at the beginning of asm_function and .cfi_endproc after the final ret.
Note that you can often see rep ret instead of just rep in assembly sources. This is nothing but a workaround to certain processors having branch-prediction performance issues when jumping to or falling through a JCC to a ret instruction. The rep prefix does nothing, except it does fix the issues those processors might otherwise have with such a jump. Recent GCC versions stopped doing this by default as the affected AMD CPUs are very old and not as relevant these days. What does `rep ret` mean?
The "key" option, -ffreestanding, is one that chooses a C "dialect"
The C programming language is actually separated into two different environments: hosted, and freestanding.
The hosted environment is one where the standard C library is available, and is used when you write programs, applications, or daemons in C.
The freestanding environment is one where the standard C library is not available. It is used when you write kernels, firmware for microcontrollers or embedded systems, implement (parts of) your own standard C library, or a "standard library" for some other C-derived language.
As an example, the Arduino programming environment is based on a subset of freestanding C++. The standard C++ library is not available, and many features of C++ like exceptions are not supported. In fact, it is very close to freestanding C with classes. The environment also uses a special pre-preprocessor, which for example automatically prepends declarations of functions without the user having to write them.
Probably the most well known example of freestanding C is the Linux kernel. Not only is the standard C library not available, but the kernel code must actually avoid floating-point operations as well, because of certain hardware considerations.
For a better understanding of what exactly does the freestanding C environment look like to a programmer, I think the best thing is to go look at the language standard itself. As of now (June 2020), the most recent standard is ISO C18. While the standard itself is not free, the final draft is; for C18, it is draft N2176(PDF).
The ld default path for ld.so (the ELF interpreter) isn't the one used on modern x86-64 GNU/Linux systems.
/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2.
There was never a good time to update the default in GNU binutils ld; when some systems were using the default, changing it would have broken them. Multi-arch systems had to configure their GCC to pass -dynamic-linker /some/path to ld, so they simply did that instead of asking and waiting for the ld default to change. So nobody ever needed the ld default to change to make anything work, except for people playing around with assembly and using ld by hand to create dynamically-linked executables.
Instead of doing that, you can link using gcc -nostartfiles to omit CRT start code which defines a _start, but still link with the normal libraries including -lc, -lgcc internal helper functions if needed, etc.
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) for more info on assembling with/without libc for asm that defines _start, or with libc + CRT for asm that defines main. (Leave out the -m32 from that answer for 64-bit; when using gcc to invoke as and ld for you, that's the only difference.)
ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
doesn't link because you put -lc before the object files that reference symbols in libc.
Order matters in linker command lines, for static libraries.
However, ld -static -e my_entry_pt ./callee.o ./caller.o -lc -o ./prog.out will link, but makes a program that segfaults when it calls glibc functions like write without having called glibc's init functions.
Dynamic linking takes care of that for you (glibc has .init functions that get called by the dynamic linker, the same mechanism that allows C++ static initializers to run in a C++ shared library). CRT startup code also calls those functions in the right order, but you left that out, too, and wrote your own entry point.
#Example's answer avoids that problem by defining its own write wrapper instead of linking with -lc, so it can be truly freestanding.
I thought glibc's write wrapper function would be simple enough not to crash, but that's not the case. It checks if the program is multi-threaded or something by loading from %fs:0x18. The kernel doesn't init FS base for thread-local storage; that's something user-space (glibc's internal init functions) would have to do.
glibc's write() faults on mov %fs:0x18,%eax if you haven't called glibc's init functions. (In a statically-linked executable where glibc couldn't get the dynamic linker to run them for you.)
Dump of assembler code for function write:
=> 0x0000000000401040 <+0>: endbr64 # for CET, or NOP on CPUs without CET
0x0000000000401044 <+4>: mov %fs:0x18,%eax ### this faults with no TLS setup
0x000000000040104c <+12>: test %eax,%eax
0x000000000040104e <+14>: jne 0x401060 <write+32>
0x0000000000401050 <+16>: mov $0x1,%eax # simple case: EAX = __NR_write
0x0000000000401055 <+21>: syscall
0x0000000000401057 <+23>: cmp $0xfffffffffffff000,%rax
0x000000000040105d <+29>: ja 0x4010b0 <write+112> # update errno on error
0x000000000040105f <+31>: retq # else return
0x0000000000401060 <+32>: sub $0x28,%rsp # the non-simple case:
0x0000000000401064 <+36>: mov %rdx,0x18(%rsp) # write is an async cancellation point or something
0x0000000000401069 <+41>: mov %rsi,0x10(%rsp)
0x000000000040106e <+46>: mov %edi,0x8(%rsp)
0x0000000000401072 <+50>: callq 0x4010e0 <__libc_enable_asynccancel>
0x0000000000401077 <+55>: mov 0x18(%rsp),%rdx
0x000000000040107c <+60>: mov 0x10(%rsp),%rsi
0x0000000000401081 <+65>: mov %eax,%r8d
0x0000000000401084 <+68>: mov 0x8(%rsp),%edi
0x0000000000401088 <+72>: mov $0x1,%eax
0x000000000040108d <+77>: syscall
0x000000000040108f <+79>: cmp $0xfffffffffffff000,%rax
0x0000000000401095 <+85>: ja 0x4010c4 <write+132>
0x0000000000401097 <+87>: mov %r8d,%edi
0x000000000040109a <+90>: mov %rax,0x8(%rsp)
0x000000000040109f <+95>: callq 0x401140 <__libc_disable_asynccancel>
0x00000000004010a4 <+100>: mov 0x8(%rsp),%rax
0x00000000004010a9 <+105>: add $0x28,%rsp
0x00000000004010ad <+109>: retq
0x00000000004010ae <+110>: xchg %ax,%ax
0x00000000004010b0 <+112>: mov $0xfffffffffffffffc,%rdx # errno update for the simple case
0x00000000004010b7 <+119>: neg %eax
0x00000000004010b9 <+121>: mov %eax,%fs:(%rdx) # thread-local errno?
0x00000000004010bc <+124>: mov $0xffffffffffffffff,%rax
0x00000000004010c3 <+131>: retq
0x00000000004010c4 <+132>: mov $0xfffffffffffffffc,%rdx # same for the async case
0x00000000004010cb <+139>: neg %eax
0x00000000004010cd <+141>: mov %eax,%fs:(%rdx)
0x00000000004010d0 <+144>: mov $0xffffffffffffffff,%rax
0x00000000004010d7 <+151>: jmp 0x401097 <write+87>
I don't fully understand what exactly write is checking for or doing. It may have something to do with async I/O, and/or POSIX thread cancellation points.
This question already has answers here:
Can't call C standard library function on 64-bit Linux from assembly (yasm) code
(2 answers)
How to print a number in assembly NASM?
(6 answers)
Closed 3 years ago.
Hey I have to call a function of glibc in assembly for an exercise. So I found this code to call printf.
section .rodata
format: db 'Hello %s', 10
name: db 'Conrad'
section .text
global main
extern printf
main:
; printf(format, name)
mov rdi, format
mov rsi, name
call printf
; return 0
mov rax, 0
ret
But i get the error:
Symbol `printf' causes overflow in R_X86_64_PC32 relocation
Compiled it with:
nasm -f elf64 -o test.o test.asm
gcc -o test test.o
The error occurs after doing
./test
Change call printf to call printf#PLT. The former only works if the actual definition of printf is withing ±2GB of the call instruction, which can't be known if the definition is in a shared library (it would work if you static link, though). "Overflow" is telling you that the relative address, which would need to be up to 64-bit, overflows in a 32-bit call instruction offset.
By using printf#PLT, you'll instead get a relative address that resolves statically at link time to a thunk in the PLT, which loads and jumps to the address of the function definition, resolved at dynamic-linking time.
As Maxime B. noted, the loads of the addresses of format and name are also not correct for position-independent code. They should be loaded with "rip-relative" form, but it looks like you're using the weird "Intel syntax" for asm and I'm not sure how to write it in that syntax. You could, as Maxime B. suggested, build with -fno-pie, but better would be figuring out the way to fix your code so it doesn't depend on being linked for a particular fixed address.
You should compile your with -no-pie
This error is explained here. Quoting the original post:
Debian switched to PIC/PIE binaries in 64-bits mode & GCC in your case is trying to link your object as PIC, but it will encounter absolute address in mov $str, %rdi.