Is there a way to use gcc to convert C to MIPS?

Is there a way to use gcc to convert C to MIPS? - c

I completed a C to MIPS conversion for a class, and I want to check it against the assembly. I have heard that there is a way of configuring gcc so that it can convert C code to the MIPS architecture rather than the x86 architecture (my computer users an Intel i5 processor) and prints the output.
Running the terminal in Ubuntu (which comes with gcc), what command do I use to configure gcc to convert to MIPS? Is there anything I need to install as well?
EDIT:
Let me clarify. Please read this.
I'm not looking for which compiler to use, or people saying "well you could cross-compile, but instead you should use this other thing that has no instructions on how to set up."
If you're going to post that, at least refer me to instructions. GCC came with Ubuntu. I don't have experience on how to install compilers and it's not easy finding online tutorials for anything other than GCC. Then there's the case of cross-compiling I need to know about as well. Thank you.

GCC can produce assembly code for a large number of architectures, include MIPS. But what architecture a given GCC instance targets is decided when GCC itself is compiled. The precompiled binary you will find in an Ubuntu system knows about x86 (possibly both 32-bit and 64-bit modes) but not MIPS.
Compiling GCC with a target architecture distinct from the architecture on which GCC itself will be running is known as preparing a cross-compilation toolchain. This is doable but requires quite a bit of documentation-reading and patience; you usually need to first build a cross-assembler and cross-linker (GNU binutils), then build the cross-GCC itself.
I recommend using buildroot. This is a set of scripts and makefiles designed to help with the production of a complete cross-compilation toolchain and utilities. At the end of the day, you will get a complete OS and development tools for a target system. This includes the cross-compiler you are after.
Another quite different solution is to use QEMU. This is an emulator for various processors and systems, including MIPS systems. You can use it to run a virtual machine with a MIPS processor, and, within that machine, install an operating system for MIPS, e.g. Debian, a Linux distribution. This way, you get a native GCC (a GCC running on a MIPS system and producing code for MIPS).
The QEMU way might be a tad simpler; using cross-compilation requires some understanding of some hairy details. Either way, you will need about 1 GB of free disk space.

It's not a configuration thing, you need a version of GCC that cross-compiles to MIPS. This requires a special GCC build and is quite hairy to set up (building GCC is not for the faint of heart).
I'd recommend using LCC for this. It's way easier to do cross-compilation with LCC than it is with GCC, and building LCC is a matter of seconds on current machines.

For a one-time use for a small program or couple functions, you don't need to install anything locally.
Use Matt Godbolt's compiler explorer site, https://godbolt.org/, which has GCC and clang for various ISAs including MIPS and x86-64, and some other compilers.
Note that the compiler explorer by default filters directives so you can just see the instructions, leaving out stuff like alignment, sections, .globl, and so on. (For a function with no global / static data, this is actually fine, especially when you just want to use a compiler to make an example for you. The default section is .text anyway, if you don't use any directives.)
Most people that want MIPS asm for homework are using SPIM or MARS, usually without branch-delay slots. (Unlike real MIPS, so you need to tweak the compiler to not take advantage of the next instruction after a branch running unconditionally, even when it's taken.) For GCC, the option is -fno-delayed-branch - that will fill every delay slot with a NOP, so the code will still run on a real MIPS. You can just manually remove all the NOPs.
There may be other tweaks needed, like MARS may require you to use jr $31 instead of j $31, Tweak mips-gcc output to work with MARS. And of course I/O code will have to be implemented using MARS's toy system calls, not jal calls to standard library functions like printf or std::ostream::operator<<. You can usefully compile (and hand-tweak) asm for manipulating data, like multiplying integers or summing or reversing an array, though.
Unfortunately GCC doesn't have an option to use register names like $a0 instead of $r. For PowerPC there's -mregnames to use r1 instead of 1, but no similar option for MIPS to use "more symbolic" reg names.
int maybe_square(int num) {
if (num>0)
return num;
return num * num;
}
On Godbolt with GCC 5.4 -xc -O3 -march=mips32r2 -Wall -fverbose-asm -fno-delayed-branch
-xc compiles as C, not C++, because I find that more convenient than flipping between the C and C++ languages in the dropdown and having the site erase my source code.
-fverbose-asm comments the asm with C variable names for the destination and sources. (In optimized code that's often an invented temporary, but not always.)
-O3 enables full optimization, because the default -O0 debug mode is a horrible mess for humans to read. Always use at least -Og if you want to look at the code by hand and see how it implements the source. How to remove "noise" from GCC/clang assembly output?. You might also use -fno-unroll-loops, and -fno-tree-vectorize if compiling for an ISA with SIMD instructions.
This uses mul instead of the classic MIPS mult + mflo, thanks to the -march= option to tell GCC we're compiling for a later MIPS ISA, not whatever the default baseline is. (Perhaps MIPS I aka R2000, -march=mips1)
See also the GCC manual's section on MIPS target options.
# gcc 5.4 -O3
square:
blez $4,$L5
nop
move $2,$4 # D.1492, num # retval = num
j $31 # jr $ra = return
nop
$L5:
mul $2,$4,$4 # D.1492, num, num # retval = num * num
j $31 # jr $ra = return
nop
Or with clang, use -target mips to tell it to compile for MIPS. You can do this on your desktop; unlike GCC, clang is normally built with multiple back-ends enabled.
From the same Godbolt link, clang 10.1 -xc -O3 -target mips -Wall -fverbose-asm -fomit-frame-pointer. The default target is apparently MIPS32 or something like that for clang. Also, clang defaults to enabling frame pointers for MIPS, making the asm noisy.
Note that it chose to make branchless asm, doing if-conversion into a conditional-move to select between the original input and the mul result. Unfortunately clang doesn't support -fno-delayed-branch; maybe it has another name for the same option, or maybe there's no hope.
maybe_square:
slti $1, $4, 1
addiu $2, $zero, 1
movn $2, $4, $1 # conditional move based on $1
jr $ra
mul $2, $2, $4 # in the branch delay slot
In this case we can simply put the mul before the jr, but in other cases converting to no-branch-delay asm is not totally trivial. e.g. branch on a loop counter before decrementing it can't be undone by putting the decrement first; that would change the meaning.
Register names:
Compilers use register numbers, not bothering with names. For human use, you will often want to translate back. Many places online have MIPS register tables that show how $4..$7 are $a0..$a3, $8 .. $15 are $t0 .. $t7, etc. For example this one.

You should install a cross-compiler from the Ubuntu repositories. GCC MIPS C cross-compilers are available in the repositories. Pick according to your needs:
gcc-mips-linux-gnu - 32-bit big-endian.
gcc-mipsel-linux-gnu - 32-bit little-endian.
gcc-mips64-linux-gnuabi64 - 64-bit big-endian.
gcc-mips64el-linux-gnuabi64 - 64-bit little-endian.
etc.
(Note for users of Ubuntu 20.10 (Groovy Gorilla) or later, and Debian users: if you usually like to install your regular compilers using the build-essential package, you would be interested to know of the existence of crossbuild-essential-mips, crossbuild-essential-mipsel, crossbuild-essential-mips64el, etc.)
In the following examples, I will assume that you chose the 32-bit little-endian version (sudo apt-get install gcc-mipsel-linux-gnu). The commands for other MIPS versions are similar.
To deal with MIPS instead of the native architecture of your system, use the mipsel-linux-gnu-gcc command instead of gcc. For example, mipsel-linux-gnu-gcc -fverbose-asm -S myprog.c produces a file myprog.s containing MIPS assembly.
Another way to see the MIPS assembly: run mipsel-linux-gnu-gcc -g -c myprog.c to produce an object file myprog.o that contains debugging information. Then view the disassembly of the object file using mipsel-linux-gnu-objdump -d -S myprog.o. For example, if myprog.c is this:
#include <stdio.h>
int main()
{
int a = 1;
int b = 2;
printf("The answer is: %d\n", a + b);
return 0;
}
And if it is compiled using mipsel-linux-gnu-gcc -g -c myprog.c, then mipsel-linux-gnu-objdump -d -S myprog.o will show something like this:
myprog.o: file format elf32-tradlittlemips
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main() {
0: 27bdffd8 addiu sp,sp,-40
4: afbf0024 sw ra,36(sp)
8: afbe0020 sw s8,32(sp)
c: 03a0f025 move s8,sp
10: 3c1c0000 lui gp,0x0
14: 279c0000 addiu gp,gp,0
18: afbc0010 sw gp,16(sp)
int a = 1;
1c: 24020001 li v0,1
20: afc20018 sw v0,24(s8)
int b = 2;
24: 24020002 li v0,2
28: afc2001c sw v0,28(s8)
printf("The answer is: %d\n", a + b);
2c: 8fc30018 lw v1,24(s8)
30: 8fc2001c lw v0,28(s8)
34: 00621021 addu v0,v1,v0
38: 00402825 move a1,v0
3c: 3c020000 lui v0,0x0
40: 24440000 addiu a0,v0,0
44: 8f820000 lw v0,0(gp)
48: 0040c825 move t9,v0
4c: 0320f809 jalr t9
50: 00000000 nop
54: 8fdc0010 lw gp,16(s8)
return 0;
58: 00001025 move v0,zero
}
5c: 03c0e825 move sp,s8
60: 8fbf0024 lw ra,36(sp)
64: 8fbe0020 lw s8,32(sp)
68: 27bd0028 addiu sp,sp,40
6c: 03e00008 jr ra
70: 00000000 nop
...

You would need to download the source to binutils and gcc-core and compile with something like ../configure --target=mips .... You may need to choose a specific MIPS target. Then you could use mips-gcc -S.

You can cross-compile the GCC so that it generates MIPS code instead of x86. That's a nice learning experience.
If you want quick results you can also get a prebuilt GCC with MIPS support. One is the CodeSourcery Lite Toolchain. It is free, comes for a lot of architectures (including MIPS) and they have ready to use binaries for Linux and Windows.
http://www.codesourcery.com/sgpp/lite/mips/portal/subscription?#template=lite

You should compile your own version of gcc which is able to cross-compile. Of course this ain't easy, so you could look for a different approach.. for example this SDK.

Related

is there any use of attribute ((interrupt)) for riscv compilers?

we can read here that the interrupt attribute keyword is use for ARM, AVR, CR16, Epiphany, M32C, M32R/D, m68k, MeP, MIPS, RL78, RX and Xstormy16.
does it have any impact on riscv compilation using riscv32-***-elf-gcc compilers?

There is a separate page for RISC-V which claims it works. You can find it here. Also you could probably verify it by compiling code with and without the attribute set.
I don't have riscv32 toolchain installed, but i managed to verify it using the riscv64 toolchain. You should reproduce the same steps using the riscv32 toolchain to make sure it works.
Using a simple test.c file:
__attribute__((interrupt))
void test() {}
Compiling it with riscv64-linux-gnu-gcc -c -o test.o test.c and disassembling with riscv64-linux-gnu-objdump -D -j.text test.o we can see it generates mret instruction at the end of the function:
0: 1141 addi sp,sp,-16
2: e422 sd s0,8(sp)
4: 0800 addi s0,sp,16
6: 0001 nop
8: 6422 ld s0,8(sp)
a: 0141 addi sp,sp,16
c: 30200073 mret
After removing the interrupt attribute the instruction changes to regular ret. According to this SO answer this seems like correct behaviour.

Normally, an interrupt handler requires a different entry/exit sequence than a normal function. The differences focus in the saving of all registers in the interrupt (normally, only some registers are preserved in a normal function call) and the return instruction is normally different (e.g. in the ARM it has to change processor mode of operation, probably this is also true in the RISCV processor)
The interrupt attribute informs the compiler of the routine properties, so it can generate the correct code for it.

Meaning of # zero_extendqisi2

I was wondering what the actual meaning of # zero_extendqisi2 in gcc assembly output was and also the usage. I couldn't find what qisi stands for or anything along those lines.
For context, the line is ldrb r3, [fp, #-9] # zero_extendqisi2 and this is ARM on a Raspberry Pi Zero W, compiled with GCC. For example, when reloading an unsigned char with conversion to int, with optimization disabled, with GCC9.2 with no options. https://godbolt.org/z/7xnfqh. Older GCC all the way to the earliest on Godbolt (4.5) and presumably earlier print the same comment.

This is an RTL instruction name, included in the Standard Names list of the GCC internals manual under zero_extendmn2. Here m,n are the machine modes qi and si, which are respectively a byte and a 32-bit integer. So this is GCC's indication that it is generating an instruction which takes a byte (here loaded from memory) and zero-extends it into a 32-bit integer (here in the register r3). Which is exactly what the ARM ldrb instruction does.
I don't know what the 2 stands for, but it's apparently part of GCC's naming convention.
As Peter points out, it's a little odd that GCC would include such a comment in the assembly without -fverbose-asm. Indeed the comment is coded in as part of the template string in the machine description file, arm.md. It could have been a debugging aid that some GCC developer added and then forgot to take out.
(If you submit this for your assignment, please cite this post properly.)

How do you call C functions from Assembly and how do you link it Statically?

I am playing around and trying to understand the low-level operation of computers and programs. To that end, I am experimenting with linking Assembly and C.
I have 2 program files:
Some C code here in "callee.c":
#include <unistd.h>
void my_c_func() {
write(1, "Hello, World!\n", 14);
return;
}
I also have some GAS x86_64 Assembly here in "caller.asm":
.text
.globl my_entry_pt
my_entry_pt:
# call my c function
call my_c_func # this function has no parameters and no return data
# make the 'exit' system call
mov $60, %rax # set the syscall to the index of 'exit' (60)
mov $0, %rdi # set the single parameter, the exit code to 0 for normal exit
syscall
I can build and execute the program like this:
$ as ./caller.asm -o ./caller.obj
$ gcc -c ./callee.c -o ./callee.obj
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out -dynamic-linker /lib64/ld-linux-x86-64.so.2
$ ldd ./prog.out
linux-vdso.so.1 (0x00007fffdb8fe000)
libc.so.6 => /lib64/libc.so.6 (0x00007f46c7756000)
/lib64/ld-linux-x86-64.so.2 (0x00007f46c7942000)
$ ./prog.out
Hello, World!
Along the way, I had some problems. If I don't set the -dynamic-linker option, it defaults to this:
$ ld -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
$ ldd ./prog.out
linux-vdso.so.1 (0x00007ffc771c5000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8f2abe2000)
/lib/ld64.so.1 => /lib64/ld-linux-x86-64.so.2 (0x00007f8f2adce000)
$ ./prog.out
bash: ./prog.out: No such file or directory
Why is this? Is there a problem with the linker defaults on my system? How can/should I fix it?
Also, static linking doesn't work.
$ ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
ld: ./callee.obj: in function `my_c_func':
callee.c:(.text+0x16): undefined reference to `write'
Why is this? Shouldn't write() just be a c library wrapper for the syscall 'write'? How can I fix it?
Where can I find the documentation on the C function calling convention so I can read up on how parameters are passed back and forth, etc...?
Lastly, while this seems to work for this simple example, am I doing something wrong in my initialization of the C stack? I mean, right now, I'm doing nothing. Should I be allocing memory from the kernel for the stack, setting bounds, and setting %rsp and %rbp before I start trying to call functions. Or is the kernel loader taking care of all this for me? If so, will all architectures under a Linux kernel take care of it for me?

While the Linux kernel provides a syscall named write, it does not mean that you automatically get a wrapper function of the same name you can call from C as write(). In fact, you need inline assembly to call any syscalls from C, if you're not using libc, because libc defines those wrapper functions.
Instead of explicitly linking your binaries with ld, let gcc do it for you. It can even assemble assembly files (internally executing a suitable version of as), if the source ends with a .s suffix. It looks like your linking problems are simply a disagreement between what GCC assumes and how you do it via LD yourself.
No, it's not a bug; the ld default path for ld.so isn't the one used on modern x86-64 GNU/Linux systems. (/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2)
Linux uses the System V ABI. The AMD64 Architecture Processor Supplement (PDF) describes the initial execution environment (when _start gets invoked), and the calling convention. Essentially, you have an initialized stack, with environment and command-line arguments stored in it.
Let's construct a fully working example, containing both C and assembly (AT&T syntax) sources, and a final static and dynamic binaries.
First, we need a Makefile to save typing long commands:
# SPDX-License-Identifier: CC0-1.0
CC := gcc
CFLAGS := -Wall -Wextra -O2 -march=x86-64 -mtune=generic -m64 \
-ffreestanding -nostdlib -nostartfiles
LDFLAGS :=
all: static-prog dynamic-prog
clean:
rm -f static-prog dynamic-prog *.o
%.o: %.c
$(CC) $(CFLAGS) $^ -c -o $#
%.o: %.s
$(CC) $(CFLAGS) $^ -c -o $#
dynamic-prog: main.o asm.o
$(CC) $(CFLAGS) $^ $(LDFLAGS) -o $#
static-prog: main.o asm.o
$(CC) -static $(CFLAGS) $^ $(LDFLAGS) -o $#
Makefiles are particular about their indentation, but SO converts tabs to spaces. So, after pasting the above, run sed -e 's|^ *|\t|' -i Makefile to fix the indentation back to tabs.
The SPDX License Identifier in the above Makefile and all following files tell you that these files are licensed under Creative Commons Zero license: that is, these are all dedicated to public domain.
Compilation flags used:
-Wall -Wextra: Enable all warnings. It is a good practice.
-O2: Optimize the code. This is a commonly used optimization level, usually considered sufficient and not too extreme.
-march=x86-64 -mtune=generic -m64: Compile to 64-bit x86-64 AKA AMD64 architecture. These are the defaults; you can use -march=native to optimize for your own system.
-ffreestanding: Compilation targets the freestanding C environment. Tells the compiler it can't assume that strlen or memcpy or other library functions are available, so don't optimize a loop, struct copy, or array initialization into calls to strlen, memcpy, or memset, for example. If you do provide asm implementations of any functions gcc might want to invent calls to, you can leave this out. (Especially if you're writing a program that will run under an OS)
-nostdlib -nostartfiles: Do not link in the standard C library or its startup files. (Actually, -nostdlib already "includes" -nostartfiles, so -nostdlib alone would suffice.)
Next, let's create a header file, nolib.h, that implements nolib_exit() and nolib_write() wrappers around the group_exit and write syscalls:
// SPDX-License-Identifier: CC0-1.0
/* Require Linux on x86-64 */
#if !defined(__linux__) || !defined(__x86_64__)
#error "This only works on Linux on x86-64."
#endif
/* Known syscall numbers, without depending on glibc or kernel headers */
#define SYS_write 1
#define SYS_exit_group 231
// Normally you'd use
// #include <asm/unistd.h> for __NR_write and __NR_exit_group
// or even #include <sys/syscall.h> for SYS_write
/* Inline assembly macro for a single-parameter no-return syscall */
#define SYSCALL1_NORET(nr, arg1) \
__asm__ volatile ( "syscall\n\t" : : "a" (nr), "D" (arg1) : "rcx", "r11", "memory")
/* Inline assembly macro for a three-parameter syscall */
#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
__asm__ volatile ( "syscall\n\t" : "=a" (retval) : "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) : "rcx", "r11", "memory" )
/* exit() function */
static inline void nolib_exit(int retval)
{
SYSCALL1_NORET(SYS_exit_group, retval);
}
/* Some errno values */
#define EINTR 4 /* Interrupted system call */
#define EBADF 9 /* Bad file descriptor */
#define EINVAL 22 /* Invalid argument */
// or #include <asm/errno.h> to define these
/* write() syscall wrapper - returns negative errno if an error occurs */
static inline long nolib_write(int fd, const void *data, long len)
{
long retval;
if (fd == -1)
return -EBADF;
if (!data || len < 0)
return -EINVAL;
SYSCALL3(retval, SYS_write, fd, data, len);
return retval;
}
The reason the nolib_exit() uses the exit_group syscall instead of the exit syscall is that exit_group ends the entire process. If you run a program under strace, you'll see it too calls exit_group syscall at the very end. (Syscall implementation of exit())
Next, we need some C code. main.c:
// SPDX-License-Identifier: CC0-1.0
#include "nolib.h"
const char *c_function(void)
{
return "C function";
}
static inline long nolib_put(const char *msg)
{
if (!msg) {
return nolib_write(1, "(null)", 6);
} else {
const char *end = msg;
while (*end)
end++; // strlen
if (end > msg)
return nolib_write(1, msg, (unsigned long)(end - msg));
else
return 0;
}
}
extern const char *asm_function(int);
void _start(void)
{
nolib_put("asm_function(0) returns '");
nolib_put(asm_function(0));
nolib_put("', and asm_function(1) returns '");
nolib_put(asm_function(1));
nolib_put("'.\n");
nolib_exit(0);
}
nolib_put() is just a wrapper around nolib_write(), that finds the end of the string to be written, and calculates the number of characters to be written based on that. If the parameter is a NULL pointer, it prints (null).
Because this is a freestanding environment, and the default name for the entry point is _start, this defines _start as a C function that never returns. (It must not ever return, because the ABI does not provide any return address; it would just crash the process. Instead, an exit-type syscall must be called at end.)
The C source declares and calls a function asm_function, that takes an integer parameter, and returns a pointer to a string. Obviously, we'll implement this in assembly.
The C source also declares a function c_function, that we can call from assembly.
Here's the assembly part, asm.s:
# SPDX-License-Identifier: CC0-1.0
.text
.section .rodata
.one:
.string "One" # includes zero terminator
.text
.p2align 4,,15
.globl asm_function #### visible to the linker
.type asm_function, #function
asm_function:
cmpl $1, %edi
jne .else
leaq .one(%rip), %rax
ret
.else:
subq $8, %rsp # 16B stack alignment for a call to C
call c_function
addq $8, %rsp
ret
.size asm_function, .-asm_function
We don't need to declare c_function as an extern because GNU as treats all unknown symbols as external symbols anyway. We could add Call Frame Information directives, at least .cfi_startproc and .cfi_endproc, but I left them out so it wouldn't be so obvious I just wrote the original code in C and let GCC compile it to assembly, and then prettified it just a bit. (Did I write that out aloud? Oops! But seriously, compiler output is often a good starting point for a hand-written asm implementation of something, unless it does a very bad job of optimizing.)
The subq $8, %rsp adjusts the stack so that it will be a multiple of 16 for the c_function. (On x86-64, stacks grow down, so to reserve 8 bytes of stack, you subtract 8 from the stack pointer.) After the call returns, addq $8, %rsp reverts the stack back to original.
With these four files, we're ready. To build the example binaries, run e.g.
reset ; make clean all
Running either ./static-prog or ./dynamic-prog will output
asm_function(0) returns 'C function', and asm_function(1) returns 'One'.
The two binaries are just 2 kB (static) and 6 kB (dynamic) in size or so, although you can make them even smaller by stripping unneeded stuff,
strip --strip-unneeded static-prog dynamic-prog
which removes about 0.5 kB to 1 kB of unneeded stuff from them – the exact amount varies depending on the version of GCC and Binutils you use.
On some other architectures, we'd need to also link against libgcc (via -lgcc), because some C features rely on internal GCC functions. 64-bit integer division (named udivdi or similar) on various architectures is a typical example.
As mentioned in the comments, the first version of the above examples had a few issues that need to be addressed. They do not stop the example from executing or working as intended, and were overlooked because the examples were written from scratch for this answer (in the hopes that others finding this question later on via web searches might find this useful), and I'm not perfect. :)
memory clobber argument to the inline assembly, in the syscall preprocessor macros
Adding "memory" in the clobbered list tells the compiler that the inline assembly may access (read and/or write) memory other than those specified in the parameter lists. It is obviously needed for the write syscall, but it is actually important for all syscalls, because the kernel can deliver e.g. signals in the same thread before returning from the syscall, and signal delivery can/will access memory.
As the GCC documentation mentions, this clobber also behaves like a read/write memory barrier for the compiler (but NOT for the processor!). In other words, with the memory clobber, the compiler knows that it must write any changes in variables etc. in memory before the inline assembly, and that unrelated variables and other memory content (not explicitly listed in the inline assembly inputs, outputs, or clobbers) may also change, and will generate the code we actually want, without making incorrect assumptions.
-fPIC -pie: Omitted for simplicity
Position independent code is usually only relevant for shared libraries. In real projects' Makefiles, you will need to use a different set of compilation flags for objects that will be compiled as a dynamic library, static library, dynamically linked executable, or a static executable, as the desired properties (and therefore compiler/linker flags) vary.
In an example such as this one, it is better to try and avoid such extraneous things, as it is a reasonable question to ask on its own ("Which compiler options to use to achieve X, when needing Y ?"), and the answers depend on the required features and context.
In most modern distros, PIE is the default and you might want -fno-pie -no-pie to simplify debugging / disassembling. 32-bit absolute addresses no longer allowed in x86-64 Linux?
-nostdlib does imply (or "include") -nostartfiles
There are quite a few overall options and link options we can use to control how the code is compiled and linked.
Many of the options GCC supports are grouped. For example, -O2 is actually shorthand for a collection of optimization features that you can explicitly specify.
Here, the reason for keeping both is to remind human programmers of the expectations for the code: no standard library, and no start files/objects.
-march=x86-64 -mtune=generic -m64 is the default on x86-64
Again, this is kept more as a reminder of what the code expects. Without a specific architecture definition, one might get the wrong impression that the code should be compilable in general, because C typically is not architecture specific!
The nolib.h header file does contain preprocessor checks (using pre-defined compiler macros to detect the operating system and hardware architecture), halting the compilation with an error for other OSes and hardware architectures.
Most Linux distributions provide the syscall numbers in <asm/unistd.h>, as __NR_name.
These are derived from the actual kernel sources. However, for any given architecture, these are the stable userspace ABI, and will not change. New ones may be added. Only in some extraordinary circumstances (unfixable security holes, perhaps?) can a syscall be deprecated and stop functioning.
It is always better to use the syscall numbers from the kernel, preferably via the aforementioned header, but it's possible to build this program with only GCC, no glibc or Linux kernel headers installed. For someone writing their own standard C library, they should include the file (from Linux kernel sources).
I do know that Debian derivatives (Ubuntu, Mint, et cetera) all do provide the <asm/unistd.h> file, but there are many, many other Linux distributions, and I just am not sure about all of them. I opted to only define the two (exit_group and write), to minimize the risk of problems.
(Editor's note: the file might be in a different place in the filesystem, but the <asm/unistd.h> include path should always work if the right header package is installed. It's part of the kernel's user-space C/asm API.)
Compilation flag -g adds debug symbols, which adds greatly when debugging – for example, when running and examining the binary in gdb.
I omitted this and all related flags, because I did not want to expand the topic any further, and because this example is easily debugged at the asm level and examined even without. See GDB asm tips like layout reg at the bottom of the x86 tag wiki
The System V ABI requires that before a call to a function, the stack is aligned to 16 bytes. So at the top of the function, RSP+-8 is 16-byte aligned, and if there are any stack args, they'll be aligned.
The call instruction pushes the current instruction pointer to the stack, and because this is a 64-bit architecture, that too is 64 bits = 8 bytes. So, to conform to the ABI, we really need to adjust the stack pointer by 8 before calling the function, to ensure it too gets a properly aligned stack pointer. These were initially omitted, but are now included in the assembly (asm.s file).
This matters, because on x86-64, SSE/AVX SIMD vectors have different instructions for aligned-to-16-bytes and unaligned accesses, with the aligned accesses being significantly faster or certain processors. (Why does System V / AMD64 ABI mandate a 16 byte stack alignment?). Using aligned SIMD instructions like movaps with unaligned addresses will cause the process to crash. (e.g. glibc scanf Segmentation faults when called from a function that doesn't align RSP is a real-life example of what happens when you get this wrong.)
However, when we do such stack manipulations, we really should add CFI (Call Frame Information) directives to ensure debugging and stack unwinding etc. works correctly. In this case, for general CFI, we prepend .cfi_startproc before the first instruction in an assembly function, and .cfi_endproc after the last instruction in an assembly function. For the Canonical Frame Address, CFA, we add .cfi_def_cfa_offset N after any instruction that modifies the stack pointer. Essentially, N is 8 at the beginning of the function, and increases as much as %rsp is decremented, and vice versa. See this article for more.
Internally, these directives produce information (metadata) stored in the .eh_frame and .eh_frame_hdr sections in the ELF object files and binaries, depending on other compilation flags.
So, in this case, the subq $8, %rsp should be followed by .cfi_def_cfa_offset 16, and the addq $8, %rsp by .cfi_def_cfa_offset 8, plus .cfi_startproc at the beginning of asm_function and .cfi_endproc after the final ret.
Note that you can often see rep ret instead of just rep in assembly sources. This is nothing but a workaround to certain processors having branch-prediction performance issues when jumping to or falling through a JCC to a ret instruction. The rep prefix does nothing, except it does fix the issues those processors might otherwise have with such a jump. Recent GCC versions stopped doing this by default as the affected AMD CPUs are very old and not as relevant these days. What does `rep ret` mean?
The "key" option, -ffreestanding, is one that chooses a C "dialect"
The C programming language is actually separated into two different environments: hosted, and freestanding.
The hosted environment is one where the standard C library is available, and is used when you write programs, applications, or daemons in C.
The freestanding environment is one where the standard C library is not available. It is used when you write kernels, firmware for microcontrollers or embedded systems, implement (parts of) your own standard C library, or a "standard library" for some other C-derived language.
As an example, the Arduino programming environment is based on a subset of freestanding C++. The standard C++ library is not available, and many features of C++ like exceptions are not supported. In fact, it is very close to freestanding C with classes. The environment also uses a special pre-preprocessor, which for example automatically prepends declarations of functions without the user having to write them.
Probably the most well known example of freestanding C is the Linux kernel. Not only is the standard C library not available, but the kernel code must actually avoid floating-point operations as well, because of certain hardware considerations.
For a better understanding of what exactly does the freestanding C environment look like to a programmer, I think the best thing is to go look at the language standard itself. As of now (June 2020), the most recent standard is ISO C18. While the standard itself is not free, the final draft is; for C18, it is draft N2176(PDF).

The ld default path for ld.so (the ELF interpreter) isn't the one used on modern x86-64 GNU/Linux systems.
/lib/ld64.so.1 might have been used on early x86-64 GNU/Linux ports before the dust settled on where multi-arch systems would put everything to support both i386 and x86-64 versions of libraries installed at the same time. Modern systems use /lib64/ld-linux-x86-64.so.2.
There was never a good time to update the default in GNU binutils ld; when some systems were using the default, changing it would have broken them. Multi-arch systems had to configure their GCC to pass -dynamic-linker /some/path to ld, so they simply did that instead of asking and waiting for the ld default to change. So nobody ever needed the ld default to change to make anything work, except for people playing around with assembly and using ld by hand to create dynamically-linked executables.
Instead of doing that, you can link using gcc -nostartfiles to omit CRT start code which defines a _start, but still link with the normal libraries including -lc, -lgcc internal helper functions if needed, etc.
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) for more info on assembling with/without libc for asm that defines _start, or with libc + CRT for asm that defines main. (Leave out the -m32 from that answer for 64-bit; when using gcc to invoke as and ld for you, that's the only difference.)
ld -static -e my_entry_pt -lc ./callee.obj ./caller.obj -o ./prog.out
doesn't link because you put -lc before the object files that reference symbols in libc.
Order matters in linker command lines, for static libraries.
However, ld -static -e my_entry_pt ./callee.o ./caller.o -lc -o ./prog.out will link, but makes a program that segfaults when it calls glibc functions like write without having called glibc's init functions.
Dynamic linking takes care of that for you (glibc has .init functions that get called by the dynamic linker, the same mechanism that allows C++ static initializers to run in a C++ shared library). CRT startup code also calls those functions in the right order, but you left that out, too, and wrote your own entry point.
#Example's answer avoids that problem by defining its own write wrapper instead of linking with -lc, so it can be truly freestanding.
I thought glibc's write wrapper function would be simple enough not to crash, but that's not the case. It checks if the program is multi-threaded or something by loading from %fs:0x18. The kernel doesn't init FS base for thread-local storage; that's something user-space (glibc's internal init functions) would have to do.
glibc's write() faults on mov %fs:0x18,%eax if you haven't called glibc's init functions. (In a statically-linked executable where glibc couldn't get the dynamic linker to run them for you.)
Dump of assembler code for function write:
=> 0x0000000000401040 <+0>: endbr64 # for CET, or NOP on CPUs without CET
0x0000000000401044 <+4>: mov %fs:0x18,%eax ### this faults with no TLS setup
0x000000000040104c <+12>: test %eax,%eax
0x000000000040104e <+14>: jne 0x401060 <write+32>
0x0000000000401050 <+16>: mov $0x1,%eax # simple case: EAX = __NR_write
0x0000000000401055 <+21>: syscall
0x0000000000401057 <+23>: cmp $0xfffffffffffff000,%rax
0x000000000040105d <+29>: ja 0x4010b0 <write+112> # update errno on error
0x000000000040105f <+31>: retq # else return
0x0000000000401060 <+32>: sub $0x28,%rsp # the non-simple case:
0x0000000000401064 <+36>: mov %rdx,0x18(%rsp) # write is an async cancellation point or something
0x0000000000401069 <+41>: mov %rsi,0x10(%rsp)
0x000000000040106e <+46>: mov %edi,0x8(%rsp)
0x0000000000401072 <+50>: callq 0x4010e0 <__libc_enable_asynccancel>
0x0000000000401077 <+55>: mov 0x18(%rsp),%rdx
0x000000000040107c <+60>: mov 0x10(%rsp),%rsi
0x0000000000401081 <+65>: mov %eax,%r8d
0x0000000000401084 <+68>: mov 0x8(%rsp),%edi
0x0000000000401088 <+72>: mov $0x1,%eax
0x000000000040108d <+77>: syscall
0x000000000040108f <+79>: cmp $0xfffffffffffff000,%rax
0x0000000000401095 <+85>: ja 0x4010c4 <write+132>
0x0000000000401097 <+87>: mov %r8d,%edi
0x000000000040109a <+90>: mov %rax,0x8(%rsp)
0x000000000040109f <+95>: callq 0x401140 <__libc_disable_asynccancel>
0x00000000004010a4 <+100>: mov 0x8(%rsp),%rax
0x00000000004010a9 <+105>: add $0x28,%rsp
0x00000000004010ad <+109>: retq
0x00000000004010ae <+110>: xchg %ax,%ax
0x00000000004010b0 <+112>: mov $0xfffffffffffffffc,%rdx # errno update for the simple case
0x00000000004010b7 <+119>: neg %eax
0x00000000004010b9 <+121>: mov %eax,%fs:(%rdx) # thread-local errno?
0x00000000004010bc <+124>: mov $0xffffffffffffffff,%rax
0x00000000004010c3 <+131>: retq
0x00000000004010c4 <+132>: mov $0xfffffffffffffffc,%rdx # same for the async case
0x00000000004010cb <+139>: neg %eax
0x00000000004010cd <+141>: mov %eax,%fs:(%rdx)
0x00000000004010d0 <+144>: mov $0xffffffffffffffff,%rax
0x00000000004010d7 <+151>: jmp 0x401097 <write+87>
I don't fully understand what exactly write is checking for or doing. It may have something to do with async I/O, and/or POSIX thread cancellation points.

Which C compiler should I use for creating my own simple OS?

This tutorial shows how to write my own simple Operating System:
Write Your Own Operating System Tutorial: http://joelgompert.com/OS/TableOfContents.htm
Every thing is OK, but developing language is Assembly. I want develope my simple OS with C programming language.
I want a C Compiler that convert my C source codes to Assembly sources and then I compile assmebly files with NASM.
Compiler should can create assembly files that compiled by NASM.
C source files that converted to assembly source files with GCC (gcc -S OS.c masm=intel) fails when I compilng them with NASM.
I dont use any C standard libraries even stdio.h or math.h.
I want a compiler that
Converts C source codes to Assembly sources.
Or a compiler that creates flat binaries (like NASM or FASM outputs) from C sources.
Which C compiler should I use for creating my own simple OS?

I don't have any actual experience with this, but you might want to take a look at :
Is there a way to get gcc to output raw binary?
http://www.embeddedrelated.com/usenet/embedded/show/99290-1.php
Looking for 16-bit c compiler for x86

A compiler by definition will compiler your high level code to machine language, ie Assembler. If you want to be able to look at the Assembler, there's a GCC switch to generate assembler. Just use the -S switch when compiling your C code with GCC.

if you want to write your own operating system in c look up Pritam Zope on YouTube. hes brilliant and has many videos on writing your own kernels and boot-loaders in c and asm. I'm also interested in writing my own operating system, Ive done some research and have decided its best to work in asm. if your interested in working in asm instead of c also check out theMike97. here is a simple boot-loader i wrote in asm encase your interested. it just prints hello world.
org 0x7c00
mov si, message ;The message location *you can change this*
call print ;CALL tells the pc to jump back here when done
jmp $
print:
mov ah, 0Eh ;Set function
.run:
lodsb ;Get the char
; cmp al, 0x00 ;I would use this but ya know u dont so use:
cmp al, 0 ;0 has a HEX code of 0x48 so its not 0x00
je .done ;Jump to done if ending code is found
int 10h ;Else print
jmp .run ; and jump back to .run
.done:
ret ;Return
message db 'Hello, world', 0 ;IF you use 0x00
;message db 'Hello, world', 0x00
times 510-($-$$) db 0
dw 0xaa55
~
save this code in a file called main.asm
compile it with nasm -fbin main.asm -o main.bin
run it with qemu-system-x86_64 main.bin

68000, portable JIT library

There are several JIT libraries, but is there any which emits Motorola 68000 style instructions, such as for instance 68000, 68040, 68060 or any of the Coldfire CPUs?
Bonus points if it could emit for other platforms too, but 68k is most important.
Something easily integrated with C is preferred, but other languages are interesting too.
Ideally something like libjit, but with a 68k backend.

Although this doesn't really answer your question, you could consider generating the 68k machine code yourself. It shouldn't be too terribly difficult if you are already familiar with 68k assembly.
The Motorola M68000 Family Programmer's Reference Manual documents the syntax, availability, and bit configuration of every 680x0 instruction. However, a less tedious way to figure out the machine code for instructions is to use a 68k assembler that can generate a listing of the hex codes for each instruction produced. If you're on Windows, Easy68K should be able to generate such a listing, but I haven't tried it myself.
If you're not on Windows, you could try this assembler (only supports 68000, I think). You'll have to blow the dust off of it, but it works (at least in Linux). The command-line assembler (assembler/asm) has a -l flag that tells the assembler to generate a listing. Example:
$ asmlab/assembler/asm -ln test.asm
68000 Assembler by PGM
No errors detected
No warnings generated
test.asm
Leading space is required before each instruction, and the assembler doesn't handle whitespace between tokens well.
move.l #$12345678,-(a6)
jmp ($12345678)
rts
test.LIS
00000000 2D3C 12345678 1 move.l #$12345678,-(a6)
00000006 4EF9 12345678 2 jmp ($12345678)
0000000C 4E75 3 rts
No errors detected
No warnings generated

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight