Compiling statically libc C code and ASM code - c

I have ASM code:
extern my_func
extern printf
extern exit
global _start
section .data
...
section .text
_start:
...
call printf
...
call my_func
...
call exit
and C code:
int my_func(int a, int b)
{
return a+b;
}
I'm using fedora on 64-bit machine. I want the executable be 32-bit.
For dynamic linking I do:
nasm -f elf32 asm.asm ; this gives me asm.o
gcc -m32 -Wall -c c_code.c ; this gives me c_code.o
ld c_code.o asm.o -melf_i386 -L /usr/lib/ -lc -I /lib/ld-linux.so.2 ; this gives me a.out which runs fine and weights 5601 bytes.
What I want to do is link libc statically. I do the following:
gcc -o a2.out -m32 -static -m32 asm.o c_code.o
And I get error:
asm.o: In function `_start':
asm.asm:(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-redhat-linux/4.8.3/../../../../lib64/crt1.o:(.text+0x0):
first defined here
collect2: error: ld returned 1 exit status
Then I change _start to main in ASM code and the whole thing links fine! ldd shows "not a dynamic executable". But the file created weights 721067 bytes! I think that it compiles statically a lot of unnecessary code.
So, my 1st question is:
1) How can I link statically only libc for the required printf and exit functions?
When I try
gcc -m32 -o a3.out -lc asm.o c_code.o ; ASM file has main instead of _start
I get a file that weights 7406 bytes. ldd shows the same dynamic libraries as for the a.out which weights 5601 bytes.
2) Why is that difference? Looks like some additional code that "connects" _start with main in my code...
3) What is the difference between linking with gcc and ld?
Thanks a lot for your attention!

1) How can I link statically only libc for the required printf
and exit functions?
Try compiling with -nostartfiles -static -nostdlib -lc which will avoid adding crt1.o and crtend.o. But keep in mind that this will disable all Glibc initialization code so some Glibc functions will fail to work.
2) Why is that difference? Looks like some additional code
that "connects" _start with main in my code...
GCC adds start files (crt*.o) which perform initialization. See the many online articles for details (e.g. this one).
3) What is the difference between linking with gcc and ld?
Already answered above but in general you can run gcc -v and inspect ld's (or collect2's) arguments.

Related

Why do I get extra system calls when compiling code directly to an executable vs. compiling to an object file and then manually linking?

I want to compile this C code with the GNU C Compiler on Ubuntu without linking any standard libraries, having only the following code execute.
static void exit(long long code)
{asm inline
("movq $60,%%rax\n"
"movq %[code],%%rdi\n"
"syscall"
:
:[code]"rm"(code)
:"rax"
,"rdi");}
static void write(long long fd,char *msg,long long len)
{asm inline
("movq $0x1,%%rax\n"
"movq %[fd],%%rdi\n"
"movq %[msg],%%rsi\n"
"movq %[len],%%rdx\n"
"syscall"
:
:[fd]"rm"(fd)
,[msg]"rm"(msg)
,[len]"rm"(len)
:"rax"
,"rdi"
,"rsi"
,"rdx");}
#define PRINT(msg) write(1,msg,sizeof(msg))
void _start()
{PRINT("Hello World.\n");
exit(0);}
I compiled with cc example.c -ffreestanding -nostartfiles -O3 -o example.
When I called the output file I saw a lot of extra system calls with strace that should not have been there:
brk
arch_prctl
access
mmap
arch_prctl
mprotect
I then compiled like this: cc example.c -c -O3 -o example.o; ld example.o -o example and it did not do the extra syscalls. It even made the filesize somewhat smaller.
The objdump -d of it was exactly the same. In the objdump -D I found some extra symbols (_DYNAMIC,__GNU_EH_FRAME_HDR,.interp) in the first case compared to the second, but still no sign of any extra syscalls in the code.
Do you know why I get the extra system calls with cc example.c -ffreestanding -nostartfiles -O3 -o example and not with cc example.c -c -O3 -o example.o; ld example.o -o example?
I found out what is happening.
If I compile the code with cc example.c -ffreestanding -nostartfiles -O3 -o example the compiler makes a dynamically linked executable. Dynamically linked executables have an .interp section. That is what I was seeing in my objdump -D.
Dynamically linked executables are executing via the program interpreter and the dynamic linker. The additional system calls I saw, came from the dynamic linker. I still do not know why the executable wants to dynamically link anything in a program that does not link any libraries and wants to be freestanding.
If you do not want the extra system calls from the dynamic linker - you should give gcc the extra -static option. The compiler does not automatically do this if there is no dynamic linking happening.

Step by step C compilation result in segfault

I'm trying to understand C compilation
Given this simple C code in main.c:
int main() {
int a;
a = 42;
return 0;
}
I performed the following operations:
cpp main.c main.i
/usr/lib/gcc/x86_64-linux-gnu/9/cc1 main.i -o main.s
as -o main.o main.s
ld -o main.exe main.o
When executing main.exe, I get a Segmentation Fault.
How can I get a good memory addressing in this example?
When I try the sequence of commands from your question on an x86_64 Ubuntu 19.10 system, I get a warning from ld:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
This is an indication that something is wrong.
The error means that the linker did not find a symbol _start and used a default address instead. When running your program it will try to execute code at this address which apparently is invalid.
An executable program compiled from C code doesn't contain only your code. The compiler instructs the linker to add C run-time library and startup code. The startup code is responsible for initialization and for calling your main function.
Run e.g.
gcc -v -o main.exe main.o
to see what other files get added to your program. On my system this shows a few files with names starting with crt which means "C runtime".
If you don't use gcc to link your program but use ld directly, you have to manually add all necessary object files in a similar way as the compiler would do automatically.

How to call c functions that call c standard library in nasm?

First I want to clarify that I know this question might have been answered hundreds of times. However after hours of Google search I simply couldn't find anything that's exactly what I want. Also even though I've been writing c programs for quite a while, I'm kind of new to nasm and ld. So I would really appreciate it if I can get a simple answer without having to read a whole nasm/ld tutorial or the complete manual.
What I want to do is:
say I have a function written in c that calls some function in the c standard library:
/* foo.c */
#include <stdio.h>
void foo(int i)
{
printf("%d\n", i);
}
I want to call this function in nasm so I tried this:
; main.asm
global _start
extern foo
section .text
_start:
push 1234567
call foo
add esp, 4
mov eax, 1
xor ebx, ebx
int 80h
Then I tried to compile them and run:
[user ~/Documents/asm/callc]#make all
nasm main.asm -felf
gcc -c foo.c -o foo.o -m32
ld -o main main.o foo.o -melf_i386 -lc
[user ~/Documents/asm/callc]#ls
foo.c foo.o main main.asm main.o Makefile
[user ~/Documents/asm/callc]#./main
bash: ./main: No such file or directory
[user ~/Documents/asm/callc]#bash main
main: main: cannot execute binary file
I didn't get any errors but apparently I couldn't run the executable output file.
If the c function doesn't call any library functions then the code above can be compiled and it will run without any problems. I also figured out a way to call library functions directly in nasm and use gcc to produce the final executable file. But none of them is exactly what I want.
EDIT:
1. I'm running 64-bit Ubuntu but I'm trying to write 32-bit programs so I used flags like -m32 and -melf_i386.
2. Output of file *:
[user ~/Documents/asm/sof]#file *
foo.c: C source, ASCII text
foo.c~: empty
foo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
main: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), not stripped
main.asm: C source, ASCII text
main.asm~: empty
main.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
Makefile: makefile script, ASCII text
Makefile~: makefile script, ASCII text
3. I really have no idea of how to tell ld to include the c standard library. I found something like -lglibc or -lc in some other posts. -lgibc doesn't work and -lc seems to be able to get rid of all errors and I probably thought it worked at first but maybe that's the problem since it probably doesn't link the correct library.
UPDATE
Adding -I/lib32/ld-linux.so.2 to the ld command solved my problem.
Below are commands to compile/assemble/link and run the program:
nasm main.asm -felf
gcc -c foo.c -o foo.o -m32
ld -o main main.o foo.o -melf_i386 -lc -I/lib32/ld-linux.so.2
./main
The C library provides code using the _start interface that starts the C runtime, calls main(), and shuts the runtime down. Hence if you intend to use the C library in your program you must not use the _start interface but provide a main() function.
This is the correct way to do it:
; main.asm
global main
extern foo
section .text
main:
push 1234567
call foo
add esp, 4
xor eax, eax
ret
Build with:
nasm -f elf32 -o main.o main.asm
gcc -m32 -o foo.o -c foo.c
gcc -m32 -o main main.o foo.o
Two remarks:
main() returns, instead of doing an exit system call, to allow the C runtime shutdown code to run.
gcc is used for linking. Internally gcc invokes ld with the appropriate parameters to link with the C library. These are platform specific and subject to change. Hence, don't use ld for this.

ld : _start not found defaulting to

Hi I made a simple hello world C program and I am compiling it like this :
gcc -c hello.c
ld hello.o -lc -o out
I get a warning from ld : ld : _start not found defaulting to ....
I do an objdump -D hello.o and I cannot find the _start routine in the output.
What am I missing here ?
You're missing the crt* stuff which you will see if you link with gcc -v: crt1.o, crtend.o, crtn.o. Look at how gcc invokes collect2 (it's visible with gcc -v) and use the same options for ld.
main function is not the executable entry point: some initialization for standard library is done before main (because it's either impossible or illogical to do otherwise). Real entry point, which is _start by default, is in crt1.o which is always linked into your executable.
It's because you don't have a main function, also you can try
gcc -v hello.c -o hello
See if it compiles successfully.
On my system (Angstrom Linux, gcc 4.3.3), it happened due me installing libgcc-s-dev instead of libgcc-dev. No binary on the system contained _start string, I checked that. Installing libgcc-dev helped.

Linking a C program directly with ld fails with undefined reference to `__libc_csu_fini`

I'm trying to compile a C program under Linux. However, out of curiosity, I'm trying to execute some steps by hand: I use:
the gcc frontend to produce assembler code
then run the GNU assembler to get an object file
and then link it with the C runtime to get a working executable.
Now I'm stuck with the linking part.
The program is a very basic "Hello world":
#include <stdio.h>
int main() {
printf("Hello\n");
return 0;
}
I use the following command to produce the assembly code:
gcc hello.c -S -masm=intel
I'm telling gcc to quit after compiling and dump the assembly code with Intel syntax.
Then I use th GNU assembler to produce the object file:
as -o hello.o hello.s
Then I try using ld to produce the final executable:
ld hello.o /usr/lib/libc.so /usr/lib/crt1.o -o hello
But I keep getting the following error message:
/usr/lib/crt1.o: In function `_start':
(.text+0xc): undefined reference to `__libc_csu_fini'
/usr/lib/crt1.o: In function `_start':
(.text+0x11): undefined reference to `__libc_csu_init'
The symbols __libc_csu_fini/init seem to be a part of glibc, but I can't find them anywhere! I tried linking against libc statically (against /usr/lib/libc.a) with the same result.
What could the problem be?
/usr/lib/libc.so is a linker script which tells the linker to pull in the shared library /lib/libc.so.6, and a non-shared portion, /usr/lib/libc_nonshared.a.
__libc_csu_init and __libc_csu_fini come from /usr/lib/libc_nonshared.a. They're not being found because references to symbols in non-shared libraries need to appear before the archive that defines them on the linker line. In your case, /usr/lib/crt1.o (which references them) appears after /usr/lib/libc.so (which pulls them in), so it doesn't work.
Fixing the order on the link line will get you a bit further, but then you'll probably get a new problem, where __libc_csu_init and __libc_csu_fini (which are now found) can't find _init and _fini. In order to call C library functions, you should also link /usr/lib/crti.o (after crt1.o but before the C library) and /usr/lib/crtn.o (after the C library), which contain initialisation and finalisation code.
Adding those should give you a successfully linked executable. It still won't work, because it uses the dynamically linked C library without specifying what the dynamic linker is. You'll need to tell the linker that as well, with something like -dynamic-linker /lib/ld-linux.so.2 (for 32-bit x86 at least; the name of the standard dynamic linker varies across platforms).
If you do all that (essentially as per Rob's answer), you'll get something that works in simple cases. But you may come across further problems with more complex code, as GCC provides some of its own library routines which may be needed if your code uses certain features. These will be buried somewhere deep inside the GCC installation directories...
You can see what gcc is doing by running it with either the -v option (which will show you the commands it invokes as it runs), or the -### option (which just prints the commands it would run, with all of the arguments quotes, but doesn't actually run anything). The output will be confusing unless you know that it usually invokes ld indirectly via one of its own components, collect2 (which is used to glue in C++ constructor calls at the right point).
I found another post which contained a clue: -dynamic-linker /lib/ld-linux.so.2.
Try this:
$ gcc hello.c -S -masm=intel
$ as -o hello.o hello.s
$ ld -o hello -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o /usr/lib/crti.o hello.o -lc /usr/lib/crtn.o
$ ./hello
hello, world
$
Assuming that a normal invocation of gcc -o hello hello.c produces a working build, run this command:
gcc --verbose -o hello hello.c
and gcc will tell you how it's linking things. That should give you a good idea of everything that you might need to account for in your link step.
In Ubuntu 14.04 (GCC 4.8), the minimal linking command is:
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
/usr/lib/x86_64-linux-gnu/crt1.o \
/usr/lib/x86_64-linux-gnu/crti.o \
-L/usr/lib/gcc/x86_64-linux-gnu/4.8/ \
-lc -lgcc -lgcc_s \
hello.o \
/usr/lib/x86_64-linux-gnu/crtn.o
Although they may not be necessary, you should also link to -lgcc and -lgcc_s, since GCC may emit calls to functions present in those libraries for operations which your hardware does not implement natively, e.g. long long int operations on 32-bit. See also: Do I really need libgcc?
I had to add:
-L/usr/lib/gcc/x86_64-linux-gnu/4.8/ \
because the default linker script does not include that directory, and that is where libgcc.a was located.
As mentioned by Michael Burr, you can find the paths with gcc -v. More precisely, you need:
gcc -v hello_world.c |& grep 'collect2' | tr ' ' '\n'
This is how I fixed it on ubuntu 11.10:
apt-get remove libc-dev
Say yes to remove all the packages but copy the list to reinstall after.
apt-get install libc-dev
If you're running a 64-bit OS, your glibc(-devel) may be broken. By looking at this and this you can find these 3 possible solutions:
add lib64 to LD_LIBRARY_PATH
use lc_noshared
reinstall glibc-devel
Since you are doing the link process by hand, you are forgetting to link the C run time initializer, or whatever it is called.
To not get into the specifics of where and what you should link for you platform, after getting your intel asm file, use gcc to generate (compile and link) your executable.
simply doing gcc hello.c -o hello should work.
Take it:
$ echo 'main(){puts("ok");}' > hello.c
$ gcc -c hello.c -o hello.o
$ ld hello.o -o hello.exe /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/crtn.o \
-dynamic-linker /lib/ld-linux.so.2 -lc
$ ./hello.exe
ok
Path to /usr/lib/crt*.o will when glibc configured with --prefix=/usr

Resources