Im trying to do a simple excercise in compilation.
I have 1 c file 1 assembly file and a makefile.
when I run the 'make' command I get the following error:
gcc -g -m32 -Wall -o mainAssignment0.o mainAssignment0.c
/tmp/ccXfVxtg.o: In function `main':
/home/caspl202/Desktop/tasks/Assignment0/mainAssignment0.c:12: undefined reference to `do_Str'
collect2: error: ld returned 1 exit status
makefile:10: recipe for target 'mainAssignment0.o' failed
make: * [mainAssignment0.o] Error 1
Meaning that for some reason the c program doesnt recognize the external ASM function.
Whats even weirder is that when I run the same makefile on the same files on a different machine it works like a charm. I would really like someone to shed some light on this thing.
C code:
#include <stdio.h>
#define MAX_LEN 100
extern int do_Str(char*);
int main(int argc, char** argv) {
char str_buf[MAX_LEN];
int counter = 0;
fgets(str_buf, MAX_LEN, stdin);
counter = do_Str (str_buf);
printf("%s%d\n",str_buf,counter);
return 0;
}
ASM code:
section .data
an: dd 0
section .text
global do_Str
do_Str:
push ebp
mov ebp, esp
pushad
mov ecx, dword [ebp+8]
loop:
cmp byte [ecx], 32
jnz noS
inc dword [an]
noS:
cmp byte [ecx], 65
jl noC
cmp byte [ecx], 90
jg noC
add byte [ecx], 32
noC:
inc ecx
cmp byte [ecx], 0
jnz loop
popad
mov eax,[an]
mov esp, ebp
pop ebp
ret
Makefile:
all: exec
libs: asm-lib
asm-lib: asmAssignment0.s
nasm -g -f elf -o asmAssignment0.o asmAssignment0.s
exec: mainAssignment0.c libs
gcc -g -m32 -c -o mainAssignment0.o mainAssignment0.c
gcc -g -m32 -o Assignment0.out mainAssignment0.o asmAssignment0.o
.PHONY: clean
clean:
rm -rf ./*.o Assignment0.out
You don't need to declare the function external.
int do_Str(char*);
Also, a function in C is prefixed with an underscore, so you must name it accordingly in your asm file.
global _do_Str
_do_Str:
The underscore is automatically added by the C compiler, so you don't have to use it in the C module.
The reason for your error you quote here is that your compile line is wrong. You can tell because you're trying to create an object file, but getting errors from the linker, so something is clearly not right:
gcc -g -m32 -Wall -o mainAssignment0.o mainAssignment0.c
...
collect2: error: ld returned 1 exit status
The problem is you forgot to add the -c flag to this compile line, so that the compiler generates an object file.
However, in your makefile the -c is present, so clearly this error you quote is not generated from the makefile you show us.
exec: mainAssignment0.c libs
gcc -g -m32 -c -o mainAssignment0.o mainAssignment0.c
I have main.s file.
extern printf
extern exit
section .data
fmt: db "hi!", 0xa
section .text
global _start
_start:
mov rax, 0
mov rdi, fmt
call printf
call exit
Compile and run
$ yasm -f elf64 main.s -o main.o
$ ld.lld main.o -o main --dynamic-linker /lib/ld-linux-x86-64.so.2
$ ./main
But i got:
ld.lld: error: undefined symbol: printf
ld.lld: error: undefined symbol: exit
ld.lld does not have -lc option like ld linker.
Just use : -L/lib option to tell the linker where to find libc
ld.lld main.o --dynamic-linker /lib/ld-linux-x86-64.so.2 -o main -L/lib -lc
I wrote the following code to call syscall exit without linking with glibc:
// a.c
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>
_Noreturn void _start()
{
register int syscall_num asm ("rax") = __NR_exit;
register int exit_code asm ("rdi") = 0;
// The actual syscall to exit
asm volatile ("syscall"
: /* no output Operands */
: "r" (syscall_num), "r" (exit_code));
}
The Makefile:
.PHONY: clean
a.out: a.o
$(CC) -nostartfiles -nostdlib -Wl,--strip-all a.o
a.o: a.c
$(CC) -Oz -c a.c
clean:
rm a.o a.out
I make it with CC=clang-7 and it works perfectly well except that when I inspect the assembly generated by objdump -d a.out:
a.out: file format elf64-x86-64
Disassembly of section .text:
0000000000201000 <.text>:
201000: 6a 3c pushq $0x3c
201002: 58 pop %rax
201003: 31 ff xor %edi,%edi
201005: 0f 05 syscall
201007: c3 retq
there is a useless retq following the syscall. I wonder, is there any way to remove that without resorting to writing the whole function in assembly?
Add this after the system call that doesn't return:
__builtin_unreachable();
Suppose there are three c files, say a.c contains functions xx(), yy() and b.c contains nn(), mm() and c.c contains qq(), rr().
I made a static library stat.a out of a.o, b.o and c.o. If I link stat.a into a test which calls xx(), then symbol yy() also gets exported: nm test has both symbols xx and yy.
I would like to know why the symbols qq and rr do not get exported ?
Is there any method to prevent any other symbols than xx being loaded?
Here is an implementation of your scenario:
a.c
#include <stdio.h>
void xx(void)
{
puts(__func__);
}
void yy(void)
{
puts(__func__);
}
b.c
#include <stdio.h>
void nn(void)
{
puts(__func__);
}
void mm(void)
{
puts(__func__);
}
c.c
#include <stdio.h>
void qq(void)
{
puts(__func__);
}
void rr(void)
{
puts(__func__);
}
test.c
extern void xx(void);
int main(void)
{
xx();
return 0;
}
Compile all the *.c files to *.o files:
$ gcc -Wall -c a.c b.c c.c test.c
Make a static library stat.a, containing a.o, b.o, c.o:
$ ar rcs stat.a a.o b.o c.o
Link program test, inputting test.o and stat.a:
$ gcc -o test test.o stat.a
Run:
$ ./test
xx
Let's see the symbol tables of the object files in stat.a:
$ nm stat.a
a.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T xx
0000000000000013 T yy
b.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
0000000000000013 T mm
0000000000000000 T nn
U puts
c.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T qq
0000000000000013 T rr
The definitions (T) of xx, yy are in member stat.a(a.o). Definitions of nn, mm
are in stat.a(b.o). Definitions of qq, rr are in stat.a(c.o).
Let's see which of those symbols are also defined in the symbol table of the program test:
$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
000000000000065d T yy
xx, which is called in the program, is defined. yy, which is not called, is also
defined. nn, mm, qq and rr, none of which are called, are all absent.
That's what you've observed.
I would like to know why the symbols qq and rr do not get exported?
What is a static library, such as stat.a, and what is its special role in a linkage?
It is an ar archive that conventionally - but not necessarily - contains nothing
but object files. You can offer such an archive to the linker from which to select the
object files it needs, if any, to carry on the linkage. The linker needs those object
files in the archive that provide definitions for symbols that have been
referenced, but not yet defined, in input files it has already linked. The
linker extracts the needed object files from the archive and inputs them to the
linkage, exactly as if they were individually named input files and the static library
was not mentioned at all.
So what the linker does with an input static library is different from what it does
with an input object file. Any input object file is linked into the output file unconditionally
(whether it is needed or not).
In this light, let's redo the linkage of test with some diagnostics (-trace) to show what
files are actually linked:
$ gcc -o test test.o stat.a -Wl,--trace
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
Apart from all the boiler-plate files for a C program linkage that gcc adds by
default, the only files of ours in the linkage are the two object files:
test.o
(stat.a)a.o
The linkage:
$ gcc -o test test.o stat.a
is exactly the same as the linkage:
$ gcc -o test test.o a.o
Let's think that through.
test.o was the first linker input. This object file was linked unconditionally into the program.
test.o contains a reference (specifically, a function call) to xx but no definition of the function xx.
So the linker now needs to find a definition of xx to complete the linkage.
The next linker input is the static library stat.a.
The linker searches stat.a for an object file that contains a defintion of xx.
It finds a.o. It extracts a.o from the archive and links it into the program.
There are no other unresolved symbol references in the linkage for which the
linker can find definitions in stat.a(b.o) or stat(c.o). So neither of those
object files is extracted and linked.
By extracting an linking (just) stat.a(a.o) the linker has got a definition
of xx that it needed to resolved the function call in test.o. But a.o also contains
the definition of yy. So that definition is also linked into the program.
nn, mm, qq and rr are not defined in the program because none of them
are defined in the object files that were linked into the program.
That's the answer to your first question. Your second is:
Is there any method to prevent any other symbols than xx being loaded?
There are at least two ways.
One is simply to define each of xx, yy, nn, mm, qq, rr in a source
file by itself. Then compile object files xx.o, yy.o, nn.o, mm.o, qq.o, rr.o
and archive all of them in stat.a. Then, if the linker ever needs to find an
object file in stat.a that defines xx, it will find xx.o, extract and link it,
and the definition of xx alone will be added to linkage.
There's another way that does not require you code just one function in each source
file. This way depends on the fact that an ELF object file, as produced by the
compiler, is composed of various sections and these sections are in fact the
units that the linker distinguishes and merges together into the output file. By
default, there is a standard ELF section for each kind of symbol. The
compiler places all of the function definitions in one code section and
all data definitions in an appropriate data section. The reason that your
linkage of program test contains the definitions of both xx and yy is that
the compiler has placed both of these definitions in the single code section of a.o,
so the linker can either merge that code section into the program, or not: it can
only link the definitions of xx and yy, or neither of them, so it is obliged
to link both, even though only xx is needed. Let's see the disassembly of the code section of a.o. By default the
code section is is called .text:
$ objdump -d a.o
a.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <xx>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <xx+0xb>
b: e8 00 00 00 00 callq 10 <xx+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
0000000000000013 <yy>:
13: 55 push %rbp
14: 48 89 e5 mov %rsp,%rbp
17: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 1e <yy+0xb>
1e: e8 00 00 00 00 callq 23 <yy+0x10>
23: 90 nop
24: 5d pop %rbp
25: c3 retq
There you see the definitions of xx and yy, both in the .text section.
But you can ask the compiler to place the definition of each global symbol
in its own section in the object file. Then the linker can seperate the code
section for any function definition from any other, and you can ask the linker
to throw away any sections that aren't used in the output file. Let's try that.
Compile all the source files again, this time asking for a separate section per symbol:
$ gcc -Wall -ffunction-sections -fdata-sections -c a.c b.c c.c test.c
Now look again at the disassembly of a.o:
$ objdump -d a.o
a.o: file format elf64-x86-64
Disassembly of section .text.xx:
0000000000000000 <xx>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <xx+0xb>
b: e8 00 00 00 00 callq 10 <xx+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
Disassembly of section .text.yy:
0000000000000000 <yy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <yy+0xb>
b: e8 00 00 00 00 callq 10 <yy+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
Now we've got two code sections in a.o: .text.xx, containing only the definition of xx,
and .text.yy, containing only the definition of yy. The linker can merge either of
these sections into a program and not merge the other.
Rebuild stat.a
$ rm stat.a
$ ar rcs stat.a a.o b.o c.o
Relink the program, this time asking the linker to discard unused input sections
(-gc-sections). We'll also ask it to trace the files it loads (-trace)
and to print a mapfile for us (-Map=mapfile):
$ gcc -o test test.o stat.a -Wl,-gc-sections,-trace,-Map=mapfile
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
The -trace output is exactly the same as before. But check again which of our
symbols are defined in the program:
$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
Only xx, which is what you want.
The output of the program is the same as before:
$ ./test
xx
Finally look at the mapfile. Near the top you see:
mapfile
...
Discarded input sections
...
...
.text.yy 0x0000000000000000 0x13 stat.a(a.o)
...
...
The linker was able to throw away the redundant code section .text.yy from
the input file stat.a(a.o). That's why the redundant definition of yy is
no longer in the program.
I would like to know why the symbols qq and rr do not get exported ?
You have to inform the linker of your intention How to force gcc to link an unused static library
gcc -L./ -o test test.c -Wl,--whole-archive stat.a -Wl,--no-whole-archive
Is there any method to prevent any other symbols than xx being loaded?
From How do I include only used symbols when statically linking with gcc?
gcc -ffunction-sections -c a.c
gcc -L./ -o test test.c -Wl,--gc-sections stat.a
If I have the following C Code
int main()
{
return 77;
}
I can generate asm code with the -S option on clang to get the following (Intel syntax)
$clang -O0 -mllvm --x86-asm-syntax=intel main.c -S
the code then is
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
push RBP
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset rbp, -16
mov RBP, RSP
Ltmp4:
.cfi_def_cfa_register rbp
mov EAX, 77
mov DWORD PTR [RBP - 4], 0
pop RBP
ret
.cfi_endproc
.subsections_via_symbols
However neither as, gas or nasm will generate an object file to link with ld... does clang or gcc generate actual good "ready to go" asm? gcc's default assembler is gas (which isn't even installed on mac os x...? isn't it the same for clang).
So how do I manually assemble the asm code and then link it?
Yes, gcc generates correct assembly code.
It's just as.
You should be able to assemble it with as and link with ld:
as -o example.o example.s
ld -macosx_version_min 10.8.0 -o example example.o -lSystem
or, probably more easily, by using gcc or clang as the frontend:
cc -o example example.s
Edit: Complete working example:
$ cat example.c
int main()
{
return 77;
}
$ gcc -S example.c -o example.s
$ as -o example.o example.s
$ ld -macosx_version_min 10.8.0 -o example example.o -lSystem
$ ./example
$ echo $?
77
OK, since you're using clang, you might want to be using its assembler, too. Mixing & matching toolchains usually ends in tears:
$ clang -cc1as -filetype obj -mllvm --x86-asm-syntax=intel -o example.o example.s
$ ld -macosx_version_min 10.8.0 -o example example.o -lSystem
Since you're using clang, why not compile it with clang instead of gcc?
clang main.s -mllvm --x86-asm-syntax=intel -o main
Try calling gcc with the .s file.