So I'm trying to compile a C file to .bin and then add it to an .img file after my first stage bootloader.
I have found these bash commands in this answer by user Michael Petch:
gcc -g -m32 -c -ffreestanding -o kernel.o kernel.c -lgcc
ld -melf_i386 -Tlinker.ld -nostdlib --nmagic -o kernel.elf kernel.o
objcopy -O binary kernel.elf kernel.bin
and used this C code (taken from the same answer, saved as kernel.c):
/* This code will be placed at the beginning of the object by the linker script */
__asm__ ("jmp _main\r\n");
int main(){
/* Do Stuff Here*/
return 0; /* return back to bootloader */
}
I executed those commands in cygwin and it produced the following result:
ld: kernel.o: in function `main':
/cygdrive/d/Work/asm/kernel.c:4: undefined reference to `___main'
objcopy: 'kernel.elf': No such file
The linker.ld file is here:
OUTPUT_FORMAT(elf32-i386)
ENTRY(_main)
SECTIONS
{
. = 0x9000;
.text : { *(.text.start) *(.text) }
.data : { *(.data) }
.bss : { *(.bss) *(COMMON) }
}
I have dissasembled the kernel.o file using objdump, the result of which is here:
> objdump -d -j .text kernel.o
kernel.o: file format pe-i386
Disassembly of section .text:
00000000 <.text>:
0: eb 00 jmp 2 <_main>
00000002 <_main>:
2: 55 push %ebp
3: 89 e5 mov %esp,%ebp
5: 83 e4 f0 and $0xfffffff0,%esp
8: e8 00 00 00 00 call d <_main+0xb>
d: b8 00 00 00 00 mov $0x0,%eax
12: c9 leave
13: c3 ret
Here is the result of gcc -v if that helps also:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/10/lto-wrapper.exe
Target: x86_64-pc-cygwin
Configured with: /mnt/share/cygpkgs/gcc/gcc.x86_64/src/gcc-10.2.0/configure --srcdir=/mnt/share/cygpkgs/gcc/gcc.x86_64/src/gcc-10.2.0 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin --target=x86_64-pc-cygwin --without-libiconv-prefix --without-libintl-prefix --libexecdir=/usr/lib --with-gcc-major-version-only --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --enable-bootstrap --enable-__cxa_atexit --with-dwarf2 --with-tune=generic --enable-languages=c,c++,fortran,lto,objc,obj-c++ --enable-graphite --enable-threads=posix --enable-libatomic --enable-libgomp --enable-libquadmath --enable-libquadmath-support --disable-libssp --enable-libada --disable-symvers --with-gnu-ld --with-gnu-as --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix --without-libintl-prefix --with-system-zlib --enable-linker-build-id --with-default-libstdcxx-abi=gcc4-compatible --enable-libstdcxx-filesystem-ts
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (GCC)
What am I doing wrong? Is this caused by cygwin? If yes, is there any other option I could use on windows? (I tried MSVC but that is just plainly horrible)
Also, my bootloader is not using any .section pseudo-ops (I have no idea on how to correctly work with them), will this cause any problems in the future and will it work correctly with the compiled C program?
By deeper searching, it can be easily found out that the __main (with an additional underscore internally) is the actual entry point for programs.
The same problem is mentioned in the following two answers:
https://stackoverflow.com/a/32164910/14320958
https://stackoverflow.com/a/45442576/14320958
Both of which claim some form of a connection to the -lgcc option and the libgcc library.
Renaming main to __main works, but is not recommended (the entry point for kernels is apparently by convention kmain as seen in other questions and answers)
The __main function is what a OS calls when starting a program and it usually contains (for example) a call to exit() (passing the return code from main if it's return type is int) and some other underlying system calls (which are probably system specific, more research would need to be done here)
GCC expects you to include a __main function even on standalone compilations, since it's by specification (or that's what I seen people claim) the default entry point for all applications
Related
Hi I'm learning c compiler with this book. https://www.sigbus.info/compilerbook
I want to show the same result as the book shows. What should I do it? I think I need to change the version of gcc, objdump or options.
This book says that it is possible to compile too from the following expected assemble output.
expected
.intel_syntax noprefix
.global main
main:
mov rax, 42
ret
actual
00000000000005fa <main>:
5fa: 55 push rbp
5fb: 48 89 e5 mov rbp,rsp
5fe: b8 2a 00 00 00 mov eax,0x2a
603: 5d pop rbp
604: c3 ret
605: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
60c: 00 00 00
60f: 90 nop
what I did
root#686394c78009:/zcc# uname -a
Linux 686394c78009 4.9.125-linuxkit #1 SMP Fri Sep 7 08:20:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
root#686394c78009:/zcc# objdump -v
GNU objdump (GNU Binutils for Ubuntu) 2.30
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
root#686394c78009:/zcc# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.4.0-1ubuntu1~18.04.1' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
root#686394c78009:/zcc# cat test1.c
int main() {
return 42;
}
root#686394c78009:/zcc# gcc -o test1 test1.c
root#686394c78009:/zcc# ./test1
root#686394c78009:/zcc# echo $?
42
root#686394c78009:/zcc# objdump -d -M intel ./test1
Update 1
Generated assembly code with the -S option. Compiling worked from the generated assembly code.
Still there are some differences from my reference book but I will learn more.
And one another curious thing is that the different register name is used respectively. I will look into it too. (I have realized I need to learn from basic..)
// expected
mov rax, 42
// actual
mov eax, 42
root#686394c78009:/zcc# gcc -S -masm=intel test1.c
root#686394c78009:/zcc# cat test1.s
.file "test1.c"
.intel_syntax noprefix
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
mov eax, 42
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",#progbits
root#686394c78009:/zcc# gcc -o test1 test1.s
root#686394c78009:/zcc# ./test1
root#686394c78009:/zcc# echo $?
42
Instead of dumping with objdump, try to directly generate assembly code with the -S option for the compiler. With -masm=intel, the output should look similar to what you expect.
Do not expect the compiler to generate the exact same code though. Different compilers and different compiler versions or even the same compiler with different flags may make different choices and generate different assembly for the same code. That's normal.
I have an assembly file with a _start label as the first thing in the .text segment. I would like this label to be the entry point of my application.
Whenever I pass this file together with another file that have a function called main, that main function ends up being the entry point of my application no matter what.
I am using the GNU linker and have tried the -e _start flag, along with changing the input file order. As long as there exist a main function, it will become the entry point.. If I rename the main function, it works fine and my _start label becomes the entry point.
EDIT: Seems like it is because of -O2 flag to the compiler.
as.s
.text
.global _start
_start:
jmp main
main.c
int main(){
return 0;
}
Compile
gcc -O2 -c as.s -o as.o
gcc -O2 -c main.c -o main.o
ld -e _start as.o main.o -o test
Output
00000000004000b0 <main>:
4000b0: 31 c0 xor %eax,%eax
4000b2: c3 retq
00000000004000b3 <_start>:
4000b3: e9 f8 ff ff ff jmpq 4000b0 <main>
Any ideas?
It appears your question really is How can I place a particular function before all others in the generated executable?
First thing is that doing this only has value in certain circumstances. An ELF executable has the entry point encoded in the ELF header. The placement of the entry point in the executable isn't relevant.
One special circumstance is a non-mulitboot compatible kernel where a custom bootloader loads a kernel that was generated by GCC and converted to binary output. Looking through your question history suggests that bootloader / kernel development might be a possibility for your requirement.
When using GCC you can't assume that the generated code will be in the order you want. As you have found out options (like optimizations) may reorder the functions relative to each other or eliminate some altogether.
One way to put a function first in an ELF executable is to place it into its own section and then create a linker script to position that section first. An example linker script link.ld that should work with C would be:
/*OUTPUT_FORMAT("elf32-i386");*/
OUTPUT_FORMAT("elf64-x86-64");
ENTRY(_start);
SECTIONS
{
/* This should be your memory offset (VMA) where the code and data
* will be loaded. In Linux this is 0x400000, multiboot loader is
* 0x100000 etc */
. = 0x400000;
/* Place special section .text.prologue before everything else */
.text : {
*(.text.prologue);
*(.text*);
}
/* Output the data sections */
.data : {
*(.data*);
}
.rodata : {
*(.rodata*);
}
/* The BSS section for uniitialized data */
.bss : {
__bss_start = .;
*(COMMON);
*(.bss);
. = ALIGN(4);
__bss_end = .;
}
/* Size of the BSS section in case it is needed */
__bss_size = ((__bss_end)-(__bss_start));
/* Remove the note that may be placed before the code by LD */
/DISCARD/ : {
*(.note.gnu.build-id);
}
}
This script explicitly places whatever is in the section .text.prologue before any other code. We just need to place _start into that section. Your as.s file could be modified to do this:
.global _start
# Start a special section called .text.prologue making it
# allocatable and executable
.section .text.prologue, "ax"
_start:
jmp main
.text
# All other regular code in the normal .text section
You'd compile, assemble and link them like this:
gcc -O2 -c main.c -o main.o
gcc -O2 -c as.s -o as.o
ld -Tlink.ld main.o as.o -o test
An objdump -D test should show the function _start before main:
test: file format elf32-i386
Disassembly of section .text:
00400000 <_start>:
400000: e9 0b 00 00 00 jmp 400010 <main>
400005: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%eax,%eax,1)
40000c: 00 00 00
40000f: 90 nop
00400010 <main>:
400010: 31 c0 xor %eax,%eax
400012: c3 ret
I am following this half-completed tutorial to develop a simple OS. One step (on page 50) is to compile a simple kernel with $ld -o kernel.bin -Ttext 0x1000 kernel.o --oformat binary. However I don't really understand what is the option -Ttext doing.
To make the question specific, why in the following experiment are md5s of kernel_1000.bin & kernel.bin equal, kernel_1001.bin & kernel_1009.bin equal, and kernel_1007.bin & kernel_1017.bin equal, while all other pairs are not equal?
My experiment
I tried to compile several different kernels with different -Ttext like the in the following Makefile:
...
kernel.o: kernel.c
gcc -ffreestanding -c kernel.c
kernel.bin: kernel.o
ld -o $# kernel.o --oformat binary
kernel_1000.bin: kernel.o
ld -o $# -Ttext 0x1000 kernel.o --oformat binary
kernel_1001.bin: kernel.o
ld -o $# -Ttext 0x1001 kernel.o --oformat binary
...
And then I checked their md5:
$ ls *.bin | xargs md5sum
d9248440a2c816e41527686cdb5118e4 kernel_1000.bin
65db5ab465301da1176b523dec387a40 kernel_1001.bin
819a5638827494a4556b7a96ee6e14b2 kernel_1007.bin
d9248440a2c816e41527686cdb5118e4 kernel_1008.bin
65db5ab465301da1176b523dec387a40 kernel_1009.bin
216b24060abce034911642acfa880403 kernel_1015.bin
e92901b1d12d316c564ba7916abca20c kernel_1016.bin
819a5638827494a4556b7a96ee6e14b2 kernel_1017.bin
d9248440a2c816e41527686cdb5118e4 kernel.bin
kernel.c
void main() {
char* video_memory = (char*) 0xb8000;
*video_memory = 'X';
}
Development environment
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10)
$ uname -a
Linux localhost 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u1 (2015-12-14) x86_64 GNU/Linux
The -Ttext option puts the .text section of your program by the given address. For example if you are compile this assembly code:
section .text
global _start
_start:
mov al, '!'
jmp l
l: mov ah, 0x0e
mov bh, 0x00
mov bl, 0x07
int 0x10
jmp $
times 510-($-$$) db 0
db 0x55
db 0xaa
with:
nasm -f elf64 -o test.o test.S
ld -o test test.o
And will look on it with the objdump, you will see that it was linked by default address, something around 0x0000000000400000 for the x86_64:
~$ objdump -D test
test: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: b0 21 mov $0x21,%al
400082: eb 00 jmp 400084 <l>
0000000000400084 <l>:
400084: b4 0e mov $0xe,%ah
...
...
...
And all addresses in the program (at least in the .text section) will be relative to this address. If you will add the -Ttext 1000 option, you will see:
~$ objdump -D test
test: file format elf64-x86-64
Disassembly of section .text:
0000000000001000 <_start>:
1000: b0 21 mov $0x21,%al
1002: eb 00 jmp 1004 <l>
0000000000001004 <l>:
1004: b4 0e mov $0xe,%ah
That you program will be linked to start at 0x1000 address and all addresses (including jmp and etc.) will be relative to the 0x1000 to.
This important for two things. In short words, when an operating system kernel loads your program, it loads your executable which is in elf format or in other binary format and reads where the .text section starts. In our case, you can link your kernel.bin as you want, because there are no loaders as an operating system kernel and your are master of all memory space.
So if you will link your kernel.bin to start at 0x1000, you will know where the code starts to work (of course if it will actually loaded at this place in memory) and if you know the base address of your code, you can get all addresses inside it, something like my_label_inside_program - _start.
I am in the process of writing a small operating system in C. I have written a bootloader and I'm now trying to get a simple C file (the "kernel") to compile with gcc:
int main(void) { return 0; }
I compile the file with the following command:
gcc kernel.c -o kernel.o -nostdlib -nostartfiles
I use the linker to create the final image using this command:
ld kernel.o -o kernel.bin -T linker.ld --oformat=binary
The contents of the linker.ld file are as follows:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
}
(The bootloader loads the image at address 0x7e00.)
This seems to work quite well - ld produces a 128-byte file containing the following instructions in the first 11 bytes:
00000000 55 push ebp
00000001 48 dec eax
00000002 89 E5 mov ebp, esp
00000004 B8 00 00 00 00 mov eax, 0x00000000
00000009 5D pop ebp
0000000A C3 ret
However, I can't figure out what the other 117 bytes are for. Disassembling them seems to produce a bunch of garbage that doesn't make any sense. The existence of the additional bytes has me wondering if I'm doing something wrong.
Should I be concerned?
These are additional sections, which were not stripped and not discarded. You want your linker.ld file to look like this:
SECTIONS
{
. = 0x7e00;
.text ALIGN (0x00) :
{
*(.text)
}
/DISCARD/ :
{
*(.comment)
*(.eh_frame_hdr)
*(.eh_frame)
}
}
I know what sections to discard from the output of objdump -t kernel.o.
Simple, you're using gcc, and it always put its initialization code before passing control to your main.
What's on that start up code I don't know, but they are there. As you may see there's also an comment 'GNU' on your binary, you can't print specific sectors by using objdump -s -j 'section name'.
I want to get the address of _GLOBAL_OFFSET_TABLE_ in my program. One way is to use the nm command in Linux, maybe redirect the output to a file and parse that file to get address of _GLOBAL_OFFSET_TABLE_. However, that method seems to be quite inefficient. What are some more efficient methods of doing it?
This appears to work:
// test.c
#include <stdio.h>
extern void *_GLOBAL_OFFSET_TABLE_;
int main()
{
printf("_GLOBAL_OFFSET_TABLE = %p\n", &_GLOBAL_OFFSET_TABLE_);
return 0;
}
In order to get consistent address of _GLOBAL_OFFSET_TABLE_, matching nm's result, you will need to compile your code with -fPIE to do code-gen as if linking into a position-independent executable. (Otherwise you get a small integer like 0x2ed6 with -fno-pie -no-pie). The GCC default for most modern Linux distros is -fPIE -pie, which would make nm addresses be just offsets relative to an image base, and the runtime address be ASLRed. (This is normally good for security, but you may not want it.)
$: gcc -fPIE -no-pie test.c -o test
It gives:
$ ./test
_GLOBAL_OFFSET_TABLE = 0x6006d0
However, nm thinks different:
$ nm test | fgrep GLOBAL
0000000000600868 d _GLOBAL_OFFSET_TABLE_
Or with a GCC too old to know about PIEs at all, let alone have it -fPIE -pie as the default, -fpic can work.
If you use assembly language, you can get _GLOBAL_OFFSET_TABLE_ address without get_pc_thunk.
It is tricky way. :)
Here is the sample code :
$ cat test.s
.global main
main:
lea HEREIS, %eax # Now %eax holds address of _GLOBAL_OFFSET_TABLE_
.section .got
HEREIS:
$ gcc -o test test.s
This is available because .got section is adjacent to the <.got.plt>
Therefore the symbol HEREIS and _GLOBAL_OFFSET_TABLE_ locate at same address.
PS. You can check it works with objdump.
Disassembly of section .got:
080495e8 <HEREIS-0x4>:
80495e8: 00 00 add %al,(%eax)
...
Disassembly of section .got.plt:
080495ec <_GLOBAL_OFFSET_TABLE_>:
80495ec: 00 95 04 08 00 00 add %dl,0x804(%ebp)
80495f2: 00 00 add %al,(%eax)
80495f4: 00 00 add %al,(%eax)