LD is producing 2000 lines of assembly for a 3 line C file. How can I get it to only produce the assembly needed? - c

I'm currently working through a document titled "Building a Simple OS -- from scratch". It teaches x86 instructions only in 32-bit. At one point the author lists this C function:
int my_function() {
return 0xbaba;
}
and says that it compiles into this assembly:
00000000 55 push ebp
00000001 89E5 mov ebp, esp
00000003 B8BABA0000 mov eax, 0xbaba
00000008 5D pop ebp
00000009 C3 ret
I have the code for my_function() in a file called basic.c and I'm using the following bash instructions (on Mac OS X Yosemite w/ Xcode installed):
gcc -ffreestanding -m32 -c basic.c -o basic.o
ld -arch i386 -no_pie -e _my_function -static -o basic.bin -image_base 0x0 basic.o
These are successful, but when I run
ndisasm -b 32 basic.bin > basic.dis
I get a file with over 2000 lines of assembly, most of which are
00000FDA 0000 add [eax],al
How can I get it to just compile to the simple five lines listed by author?

You should be looking at the .o file, not the linked file (or using a different tool to disassemble just the desired function in the linked file). Per the manual:
NDISASM does not have any understanding of object file formats, like objdump, and it will not understand DOS .EXE files like debug will. It just disassembles.
ld in the OS X / Xcode toolchain produces a Mach-O binary. This includes various metadata in addition to the machine code for the function. ndisasm isn't aware of the file structure and is attempting to disassemble the metadata as code (which it isn't).

Related

Master Boot Record using GNU Assembly: extra bytes in flat binary output

I am try to compile the simple following MBR:
.code16
.globl _start
.text
_start:
end:
jmp end
; Don't bother with 0xAA55 yet
I run the following commands:
> as --32 -o boot.o boot.s
> ld -m elf_i386 boot.o --oformat=binary -o mbr -Ttext 0x7c00
However, I get a binary file of more than 129MB which is strange to me. Thus,
I wanted to know what is going on in that build process ? Thank you very much.
Running objdump over boot.o give me:
> objdump -s boot.o
boot.o: format de fichier elf32-i386
Contenu de la section .text :
0000 ebfe ..
Contenu de la section .note.gnu.property :
0000 04000000 18000000 05000000 474e5500 ............GNU.
0010 020001c0 04000000 00000000 010001c0 ................
0020 04000000 01000000
Manually removing the section .note.gnu.property before calling ld seems to solve the problem. However, I don't know why this section appears by default... Running the following build commands seems to solve the problem too:
> as --32 -o boot.o boot.s -mx86-used-note=no
> ld -m elf_i386 boot.o --oformat=binary -o mbr -Ttext 0x7c00
ld links all your sections into the flat binary output unless you tell it not to (with a linker script for example).
The extra bytes are from the .note.gnu.property section which as adds, which can indicate stuff like x86 ISA version (e.g. AVX2+FMA+BMI2, Haswell feature level, is x86-64_v3.) You don't want that in your flat binary, especially not at the default high address far from where you tell it to put your .text section with -Ttext; that would result in a huge file with zeros padding the gap since it's a flat binary.
Using as -mx86-used-note=no will omit that section from the .o in the first place, leaving only the sections you define in your asm source. From the GAS manual's i386 options
-mx86-used-note=no
-mx86-used-note=yes
These options control whether the assembler should generate GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED GNU
property notes. The default can be controlled by the
--enable-x86-used-note configure option.
using -mx86-used-note=no flag with as will remove note section.
Check here https://sourceware.org/binutils/docs/as/i386_002dOptions.html
-mx86-used-note=no
-mx86-used-note=yes
These options control whether the assembler should generate GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED GNU
property notes. The default can be controlled by the
--enable-x86-used-note configure option.

undefined reference to `_GLOBAL_OFFSET_TABLE_' in gcc 32-bit code for a trivial function, freestanding OS

I have a small c code file(function.c):
int function()
{
return 0x1234abce;
}
I am using a 64 bit machine. However, I want to write a small 32 bit OS. I want to compile the code into a 'pure' assembly/binary file.
I compile my code with:
gcc function.c -c -m32 -o file.o -ffreestanding # This gives you the object file
I link it with:
ld -o function.bin -m elf_i386 -Ttext 0x0 --oformat binary function.o
I am getting the following error:
function.o: In function `function':
function.c:(.text+0x9): undefined reference to `_GLOBAL_OFFSET_TABLE_'
You need -fno-pie; the default (in most modern distros) is -fpie: generate code for a position-independent executable. This is a code-gen option separate from the -pie linker option (which gcc also passes by default), and is independent of -ffreestanding. -fpie -ffreestanding implies you want a freestanding PIE that uses a GOT, so that's what GCC targets.
-fpie only costs a bit of speed in 64-bit code (where RIP-relative addressing is possible) but is quite bad for 32-bit code; compilers get a pointer to the GOT in one of the integer registers (tying up another one of the 8) and access static data relative to that address with [reg + disp32] addressing modes like [eax + foo#GOTOFF]
With optimization disabled, gcc -fpie -m32 generates the address of the GOT in a register even though the function doesn't access any static data. You'd can see this if you look at your compiler output (with gcc -S instead of -c on the machine you're compiling on).
On Godbolt we can use -m32 -fpie to give the same effect as a GCC configured with --enable-default-pie:
# gcc9.2 -O0 -m32 -fpie
function():
push ebp
mov ebp, esp # frame pointer
call __x86.get_pc_thunk.ax
add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_ # EAX points to the GOT
mov eax, 305441742 # overwrite with the return value
pop ebp
ret
__x86.get_pc_thunk.ax: # this is the helper function gcc calls
mov eax, DWORD PTR [esp]
ret
The "thunk" returns its return address. i.e. the address of the instruction after the call. The .ax name means to return in EAX. Modern GCC can choose any register; traditionally the 32-bit PIC base register was always EBX but modern GCC chooses a call-clobbered register when that avoids an extra save/restore of EBX.
Fun fact: call +0; pop eax would be more efficient, and only 1 byte larger at each call site. You might think that would unbalance the return-address predictor stack, but in fact call +0 is special-cased on most CPUs to not do that. http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/#call0. (call +0 means the rel32 = 0, so it calls the next instruction. That's not how NASM would interpret that syntax, though.)
clang doesn't generate a GOT pointer unless it needs one, even at -O0. But it does so with call +0;pop %eax: https://godbolt.org/z/GFY9Ht
By default, your compiler creates a position-independant executable.
You can force your compiler to build a non-pie executable by passing the option -fno-pie.

arm-none-eabi-gcc with Cmake has not entry point with flag -nostdlib

I'm trying to make a hello world in arm architecture using CMake with this toolchain
My main.c
int main()
{
char *str = "Hello World";
return 0;
}
And my CMakeLists.txt
cmake_minimum_required(VERSION 3.4)
SET(PROJ_NAME arm-hello-world-nostdlib)
PROJECT(${PROJ_NAME})
# Include directories with headers
#---------------------------------------------------#
INCLUDE_DIRECTORIES( ${CMAKE_CURRENT_SOURCE_DIR}/include )
# Source
#---------------------------------------------------#
FILE(GLOB ${PROJ_NAME}_SRC
"src/*.c"
)
FILE(GLOB ${PROJ_NAME}_HEADERS
"include/*.h"
)
# Create Exe
#---------------------------------------------------#
ADD_EXECUTABLE(${PROJ_NAME} ${${PROJ_NAME}_SRC} ${${PROJ_NAME}_HEADERS})
# Specify libraries or flags to use when linking a given target.
#---------------------------------------------------#
TARGET_LINK_LIBRARIES(${PROJ_NAME} -nostdlib --specs=rdimon.specs -lm -lrdimon)
This configuration launch the warning:
[100%] Linking C executable arm-hello-world-nostdlib
/usr/lib/gcc/arm-none-eabi/5.2.0/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
And executing the binary with qemu crash the execution:
qemu-arm arm-hello-world-nostdlib
qemu: uncaught target signal 4 (Illegal instruction) - core dumped
Illegal instruction (core dumped)
Without flag --nostdlib works perfectly, and command
arm-none-eabi-objdump -s arm-hello-world-nostdlib
Show a lot of info in binary, compiling with the flag only show:
samples/helloworld-nostdlib/arm-hello-world-nostdlib: file format elf32-littlearm
Contents of section .text:
8000 80b483b0 00af044b 7b600023 18460c37 .......K{`.#.F.7
8010 bd465df8 047b7047 1c800000 .F]..{pG....
Contents of section .rodata:
801c 48656c6c 6f20576f 726c6400 Hello World.
Contents of section .comment:
0000 4743433a 20284665 646f7261 20352e32 GCC: (Fedora 5.2
0010 2e302d33 2e666332 33292035 2e322e30 .0-3.fc23) 5.2.0
0020 00 .
Contents of section .ARM.attributes:
0000 41380000 00616561 62690001 2e000000 A8...aeabi......
0010 05436f72 7465782d 4d340006 0d074d09 .Cortex-M4....M.
0020 020a0612 04140115 01170318 0119011a ................
0030 011b011c 011e0622 01 .......".
I dont want stl libraries in my binary, but I guess I missing the assembly code to find the entry point. How can add it manually?
Update:
According to GNU Linker doc for -nostdlib:
Do not use the standard system startup files or libraries when
linking. No startup files and only the libraries you specify will be
passed to the linker, and options specifying linkage of the system
libraries, such as -static-libgcc or -shared-libgcc, are ignored.
Alternatively, If someone don't want to user standard library, they can use flag -nodefaultlibs.
Do not use the standard system libraries when linking. Only the
libraries you specify are passed to the linker, and options specifying
linkage of the system libraries, such as -static-libgcc or
-shared-libgcc, are ignored. The standard startup files are used normally, unless -nostartfiles is used.
The compiler may generate calls to memcmp, memset, memcpy and memmove.
These entries are usually resolved by entries in libc. These entry
points should be supplied through some other mechanism when this
option is specified.
By the way, I want a way to create and add startup files, a possible way in this tutorial, but I add the bounty to get a answer to my question and have a general solution for everybody. I consider this userful for people who wants to customize and learn about crosscompilation, arm, and startup files.
Update 2
Using start.S assembly code:
.text
.align 4
.global _start
.global _exit
_start:
mov fp, #0 /* frame pointer */
ldr a1, [sp] /* 1st arg = argc */
add a2, sp, #4 /* 2nd arg = argv */
bl main
_exit:
mov r7, #1 /* __NR_exit */
swi 0
.type _start,function
.size _start,_exit-_start
.type _exit,function
.size _exit,.-_exit
to indicate the entry point provided by arsv, and compiling using command:
arm-none-eabi-gcc -nostdlib -o main main.c start.S
seems to work propertly. Update of CMakeLists.txt:
#Directly works:
#arm-none-eabi-gcc -nostdlib -o main main.c start.S
cmake_minimum_required(VERSION 3.4)
SET(PROJ_NAME arm-hello-world-nostdlib)
# Assembler files (.S) in the source list are ignored completely by CMake unless we
# “enable” the assembler by telling CMake in the project definition that we’re using assembly
# files. When we enable assembler, CMake detects gcc as the assembler rather than as – this
# is good for us because we then only need one set of compilation flags.
PROJECT(${PROJ_NAME} C ASM)
# Include directories with headers
#---------------------------------------------------#
INCLUDE_DIRECTORIES( ${CMAKE_CURRENT_SOURCE_DIR}/include )
# Source
#---------------------------------------------------#
FILE(GLOB ${PROJ_NAME}_SRC
"src/start.S"
"src/*.c"
)
FILE(GLOB ${PROJ_NAME}_HEADERS
"include/*.h"
)
# Create Exe
#---------------------------------------------------#
ADD_EXECUTABLE(${PROJ_NAME} ${${PROJ_NAME}_SRC} ${${PROJ_NAME}_HEADERS} )
# Specify libraries or flags to use when linking a given target.
#---------------------------------------------------#
TARGET_LINK_LIBRARIES(${PROJ_NAME} -nostdlib --specs=rdimon.specs -lm -lrdimon)
If you get linking problems like:
arm-none-eabi/bin/ld: error: CMakeFiles/arm-hello-world-nostdlib.dir/src/main.c.obj: Conflicting CPU architectures 1/13
Its a problem with toolchain, for cortex-a9, works using:
set(CMAKE_C_FLAGS
"${CMAKE_C_FLAGS}"
"-mcpu=cortex-a9 -march=armv7-a -mthumb"
"-mfloat-abi=softfp -mfpu=fpv4-sp-d16"
)
Here's _start.s I use in a small project of mine.
It should be enough to link and run your main() with qemu-arm:
.text
.align 4
.global _start
.global _exit
_start:
mov fp, #0 /* frame pointer */
ldr a1, [sp] /* 1st arg = argc */
add a2, sp, #4 /* 2nd arg = argv */
bl main
_exit:
mov r7, #1 /* __NR_exit */
swi 0
.type _start,function
.size _start,_exit-_start
.type _exit,function
.size _exit,.-_exit
Note this is startup code for common Linux userspace binary on ARM. Which is what you probably want for qemu-arm (qemu linux-user mode or syscall proxy). For other cases, like bare iron binaries in the linked post, or non-Linux userspace, or other architectures, startup code will be different.
In Linux, a newly-loaded binary gets invoked with argc at the top of the stack, followed by argv[], followed by envp[], followed by auxv[]. The startup code has to turn that into a proper main(argc, argv) call according to the arch call convention. For ARM that's 1st argument in register a1, 2nd in a2.
"Gets invoked" above means a jump to e_entry address from the ELF header, which is set by ld to point to _start symbol if one is found. With no _start defined anywhere, ld set e_entry to 0x8000 and whatever happened to be at 0x8000 when the jump was made apparently did not look like a valid ARM instruction. Which is not exactly unexpected.
Reading code from smaller/cleaner libc implementations like musl or dietlibc helps a lot in understanding stuff like this. The code above originates from dietlibc by the way.
https://github.com/ensc/dietlibc/blob/master/arm/start.S
http://git.musl-libc.org/cgit/musl/tree/arch/arm/crt_arch.h
For reference, minimalistic CMakeLists.txt to build the project:
(assuming the files are named main.c and _start.s)
project(arm-hello-world-nostdlib)
cmake_minimum_required(VERSION 3.4)
enable_language(ASM)
set(CMAKE_C_COMPILER arm-none-gnueabi-gcc)
set(CMAKE_ASM_COMPILER arm-none-gnueabi-gcc)
set(CMAKE_ASM_FLAGS -c)
set(CMAKE_VERBOSE_MAKEFILE on)
add_executable(main _start.s main.c)
target_link_libraries(main -nostdlib)
Run the resulting executable like this: qemu-arm ./main

How to call c functions that call c standard library in nasm?

First I want to clarify that I know this question might have been answered hundreds of times. However after hours of Google search I simply couldn't find anything that's exactly what I want. Also even though I've been writing c programs for quite a while, I'm kind of new to nasm and ld. So I would really appreciate it if I can get a simple answer without having to read a whole nasm/ld tutorial or the complete manual.
What I want to do is:
say I have a function written in c that calls some function in the c standard library:
/* foo.c */
#include <stdio.h>
void foo(int i)
{
printf("%d\n", i);
}
I want to call this function in nasm so I tried this:
; main.asm
global _start
extern foo
section .text
_start:
push 1234567
call foo
add esp, 4
mov eax, 1
xor ebx, ebx
int 80h
Then I tried to compile them and run:
[user ~/Documents/asm/callc]#make all
nasm main.asm -felf
gcc -c foo.c -o foo.o -m32
ld -o main main.o foo.o -melf_i386 -lc
[user ~/Documents/asm/callc]#ls
foo.c foo.o main main.asm main.o Makefile
[user ~/Documents/asm/callc]#./main
bash: ./main: No such file or directory
[user ~/Documents/asm/callc]#bash main
main: main: cannot execute binary file
I didn't get any errors but apparently I couldn't run the executable output file.
If the c function doesn't call any library functions then the code above can be compiled and it will run without any problems. I also figured out a way to call library functions directly in nasm and use gcc to produce the final executable file. But none of them is exactly what I want.
EDIT:
1. I'm running 64-bit Ubuntu but I'm trying to write 32-bit programs so I used flags like -m32 and -melf_i386.
2. Output of file *:
[user ~/Documents/asm/sof]#file *
foo.c: C source, ASCII text
foo.c~: empty
foo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
main: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), not stripped
main.asm: C source, ASCII text
main.asm~: empty
main.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
Makefile: makefile script, ASCII text
Makefile~: makefile script, ASCII text
3. I really have no idea of how to tell ld to include the c standard library. I found something like -lglibc or -lc in some other posts. -lgibc doesn't work and -lc seems to be able to get rid of all errors and I probably thought it worked at first but maybe that's the problem since it probably doesn't link the correct library.
UPDATE
Adding -I/lib32/ld-linux.so.2 to the ld command solved my problem.
Below are commands to compile/assemble/link and run the program:
nasm main.asm -felf
gcc -c foo.c -o foo.o -m32
ld -o main main.o foo.o -melf_i386 -lc -I/lib32/ld-linux.so.2
./main
The C library provides code using the _start interface that starts the C runtime, calls main(), and shuts the runtime down. Hence if you intend to use the C library in your program you must not use the _start interface but provide a main() function.
This is the correct way to do it:
; main.asm
global main
extern foo
section .text
main:
push 1234567
call foo
add esp, 4
xor eax, eax
ret
Build with:
nasm -f elf32 -o main.o main.asm
gcc -m32 -o foo.o -c foo.c
gcc -m32 -o main main.o foo.o
Two remarks:
main() returns, instead of doing an exit system call, to allow the C runtime shutdown code to run.
gcc is used for linking. Internally gcc invokes ld with the appropriate parameters to link with the C library. These are platform specific and subject to change. Hence, don't use ld for this.

Objdump ARM aarch64 code?

I have an elf arm aarch64 binary and i want to disassemble it .text section using objdump.My machine is amd64.
I tried Using objdump for ARM architecture: Disassembling to ARM but objdump is not identifying the binary so not able to disassemble it.
Go to http://releases.linaro.org/latest/components/toolchain/binaries/ and get your choice of gcc-linaro-aarch64-linux-gnu-4.9-* like for example gcc-linaro-aarch64-linux-gnu-4.9-2014.07_linux.tar.bz2.
After unpacking invoke aarch64-linux-gnu-objdump, ie:
echo "int main(void) {return 42;}" > test.c
gcc-linaro-aarch64-linux-gnu-4.9-2014.07_linux/bin/aarch64-linux-gnu-gcc -c test.c
gcc-linaro-aarch64-linux-gnu-4.9-2014.07_linux/bin/aarch64-linux-gnu-objdump -d test.o
to get objdump.
test.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <main>:
0: 52800540 mov w0, #0x2a // #42
4: d65f03c0 ret
Use the same toolchain which you used to compile the binary
In case of ARM architecture, it would generally be like arm-linux-gnueabi-gcc, so for objdump you should use
arm-linux-gnueabi-objdump
At present I guess you must using x86 toolchain(objdump) to disassemble the binary compiled using ARM toolchain hence the error

Resources