I am receiving the warning after I have added -nostdlib to the linker flags.
tricore/bin/ld.exe: warning: cannot find entry symbol _start; defaulting to c0000000
The linking is done as follows:
$(OUTDIR)/$(BINARY_NAME).elf: $(OUTDIR) $(OBJ)
$(TRICORE_TOOLS)/bin/tricore-gcc -Tld/iRAM.ld -Wl,--no-warn-flags -Wl,
--gc-sections -Wl,-n -nostdlib -o $# $(OBJ) C:\OpenSSL-Win32\lib\MinGW
\libssl-1_1.a C:\OpenSSL-Win32\lib\MinGW\libcrypto-1_1.a
I read that -nostdlib results in not using the standard system startup files or libraries when linking.
The file ld/iRAM.ld looks as follows, and as far as i understand, it contains the _start symbol and it is passed to the linker:
ENTRY(_start)
/*
* Global
*/
/*Program Flash Memory (PFLASH0)*/
__PMU_PFLASH0_BEGIN = 0x80000000;
__PMU_PFLASH0_SIZE = 2M;
/*Program Flash Memory (PFLASH1)*/
........
........
SECTIONS
{
/*Code-Sections*/
/*
* Startup code for TriCore
*/
.startup_code :
{
PROVIDE(__startup_code_start = .);
........
}
.....
}
I have read, that if I pass the -nostdlib flag to the linker, I need to provide the startup code to it too. Does anyone know what I am doing wrong here?
The ENTRY directive in linker script only specifies the name of the entry point symbol (i.e. function). However you still need to actually provide the function with such name in one of your source files.
Most likely solution is to rename your main function to _start function if you have one. Also note that the _start won't have argc and argv parameters since they are usually provided by the standard library. It also should never return since there is nowhere to return to. Instead you'll have to call platform-specific exit function: exit() syscall on Linux or ExitProcess() on Windows. This, however, might not be necessary if you are working in a freestanding environment (i.e. no OS).
Related
let's say we have the following code:
main.c
extern void some_function();
int main()
{
some_function();
return 0;
}
mylib.c
void some_function()
{
...
}
and I have create a share library mylib.so and link i to executable object file prog
linux> gcc -shared -fpic -o mylib.so mylib.c
linux> gcc -o prog main.c ./mylib.so
and let's say the picture below is the executable object format of prog
By dynamic linking, we know that none of the code or data sections from mylib.so are actually copied into the executable prog2l at this point. Instead, the linker copies some relocation and symbol table information that will allow references to code and data in mylib.so to be resolved at load time.
and I just want double check if my understanding is correct:
when prog is loaded into the memory by the loader as the picture below show
then the dynamic linker will modify the .data section of prog in memory so that it can be linked/relocated to the instruction address of some_function in the .text section of mylib.so.
Is my understanding correct?
Fairly close. The dynamic linker will modify something in the data segment, not specifically the .data section - segments are a coarser-grained thing corresponding to how the file is mapped into memory rather than the original semantic breakdown. The actual section is usually called .got or .got.plt but may vary by platform. The modification is not "relocating it to the instruction address" but resolving a relocation reference to the function name to get the address it was loaded at, and filling that address in.
Suppose I have the following files:
libmy_static_lib.c:
#include <stdio.h>
void func1(void){
printf("func1() called from a static library\n");
}
void unused_func1(void){
printf("printing from the unused function1\n");
}
void unused_func2(void){
printf("printing from unused function2\n");
}
libmy_static_lib.h:
void func(void);
void unused_func1(void);
void unused_func2(void);
my_prog.c:
#include "libmy_static_lib.h"
#include <stdio.h>
void func_in_my_prog()
{
printf("in my prog\n");
func1();
}
And here is how I link the library:
# build the static library libmy_static_lib.a
gcc -fPIC -c -fdata-sections --function-sections -c libmy_static_lib.c -o libmy_static_lib.o
ar rcs libmy_static_lib.a libmy_static_lib.o
# build libmy_static_lib.a into a new shared library
gcc -fPIC -c ./my_prog.c -o ./my_prog.o
gcc -Wl,--gc-sections -shared -m64 -o libmy_shared_lib.so ./my_prog.o -L. -l:libmy_static_lib.a
There are 2 functions in libmy_static_lib.c that are not used, and from this post, I think
gcc fdata-sections --function-sections
should create a symbol for each function, and
gcc -Wl,--gc-sections
should remove the unused symbols when linking
however when I run
nm libmy_shared_lib.so
It is showing that these 2 unused functions are also being linked into the shared library.
Any suggestions on how to have gcc remove the unused functions automatically?
Edit:
I am able to use the above options for gcc to remove the unused functions if I am linking a static library directly to executable. But it doesn't remove the unused functions if I link the static library to a shared library.
You can use a version script to mark the entry points in combination with -ffunction-sections and --gc-sections.
For example, consider this C file (example.c):
int
foo (void)
{
return 17;
}
int
bar (void)
{
return 251;
}
And this version script, called version.script:
{
global: foo;
local: *;
};
Compile and link the sources like this:
gcc -Wl,--gc-sections -shared -ffunction-sections -Wl,--version-script=version.script example.c
If you look at the output of objdump -d --reloc a.out, you will notice that only foo is included in the shared object, but not bar.
When removing functions in this way, the linker will take indirect dependencies into account. For example, if you turn foo into this:
void *
foo (void)
{
extern int bar (void);
return bar;
}
the linker will put both foo and bar into the shared object because both are needed, even though only bar is exported.
(Obviously, this will not work on all platforms, but ELF supports this.)
You're creating a library, and your symbols aren't static, so it's normal that the linker doesn't remove any global symbols.
This -gc-sections option is designed for executables. The linker starts from the entrypoint (main) and discovers the function calls. It marks the sections that are used, and discards the others.
A library doesn't have 1 entrypoint, it has as many entrypoints as global symbols, which explains that it cannot clean your symbols. What if someone uses your .h file in his program and calls the "unused" functions?
To find out which functions aren't "used", I'd suggest that you convert void func_in_my_prog() to int main() (or copy the source into a modified one containing a main()), then create an executable with the sources, and add -Wl,-Map=mapfile.txt option when linking to create a mapfile.
gcc -Wl,--gc-sections -Wl,--Map=mapfile.txt -fdata-sections -ffunction-sections libmy_static_lib.c my_prog.c
This mapfile contains the discarded symbols:
Discarded input sections
.drectve 0x00000000 0x54 c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-mingw32/6.2.1/crt2.o
.drectve 0x00000000 0x1c c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-
...
.text$unused_func1
0x00000000 0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
.text$unused_func2
0x00000000 0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
.rdata$zzz 0x00000000 0x38 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
...
now we see that the unused functions have been removed. They don't appear in the final executable anymore.
There are existing tools that do that (using this technique but not requiring a main), for instance Callcatcher. One can also easily create a tool to disassemble the library and check for symbols defined but not called (I've written such tools in python several times and it's so much easier to parse assembly than from high-level code)
To cleanup, you can delete the unused functions manually from your sources (one must be careful with object-oriented languages and dispatching calls when using existing/custom assembly analysis tools. On the other hand, the compiler isn't going to remove a section that could be used, so that is safe)
You can also remove the relevant sections in the library file, avoiding to change source code, for instance by removing sections:
$ objcopy --remove-section .text$unused_func1 --remove-section text$unused_func2 libmy_static_lib.a stripped.a
$ nm stripped.a
libmy_static_lib.o:
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 r .rdata$zzz
00000000 t .text
00000000 t .text$func1
00000000 T _func1
U _puts
When laying out symbols in the address space using a linker script, ld allows to
refer to a specific symbol coming from a static library with the following
syntax:
archive.a:object_file.o(.section.symbol_name)
Using gold rather than ld, it seems that such a directive is ignored. The
linking process succeeds. However, when using this instruction to put a specific
symbol at a specific location with gold and checking the resulting symbol layout
using nm or having a look at the Map file, the symbol is not in the expected
location.
I made a small test case using a dummy hello world program statically compiled
in its entrety with gcc 5.4.0. The C library is musl libc (last commit on the
master branch from the official git repository). For binutils, I also use the
last commit on the master branch from the official git repository.
I use the linker script to place a specific symbol (.text.exit) from a static
library (musl C library: libc.a) at a specific location in the address space
which is: the first position in the .text section.
My linker script is:
ENTRY(_start)
SECTIONS
{
. = 0x10000;
.text :
{
/* Forcing .text.exit in the first position in .text section */
musl/lib/libc.a:exit.o(.text.exit);
*(.text*);
}
. = 0x8000000;
.data : { *(.data*) }
.rodata : { *(.rodata*) }
.bss : { *(.bss*) }
}
My Makefile:
# Set this to 1 to link with gold, 0 to link with ld
GOLD=1
SRC=test.c
OBJ=test.o
LIBS=musl/lib/crt1.o \
musl/lib/libc.a \
musl/lib/crtn.o
CC=gcc
CFLAGS=-nostdinc -I musl/include -I musl/obj/include
BIN=test
LDFLAGS=-static
SCRIPT=linker-script.x
MAP=map
ifeq ($(GOLD), 1)
LD=binutils-gdb/gold/ld-new
else
LD=binutils-gdb/ld/ld-new
endif
all:
$(CC) $(CFLAGS) -c $(SRC) -o $(OBJ)
$(LD) --output $(BIN) $(LDFLAGS) $(OBJ) $(LIBS) -T $(SCRIPT) \
-Map $(MAP)
clean:
rm -rf $(OBJ) $(BIN) $(MAP)
After compiling and linking I'm checking the map file (obtained using the -Map
ld/gold flag) to have a look at the location of .text.exit. Using ld as the
linker, it is indeed in the first position of the text section. Using gold, it
is not (it is present farther in the address space, as if my directive was not
taken into account).
Now, while neither of these work with gold:
musl/lib/libc.a:exit.o(.text.exit);
musl/lib/libc.a(.text.exit)
This works:
*(.text.exit);
Is that a missing feature in gold? or am I doing something wrong, maybe there is
another way to refer to a specific symbol of a specific object file in an
archive using gold?
When laying out symbols in the address space using a linker script, ld allows to
refer to a specific symbol coming from a specific object file inside a static
library with the following syntax:
archive.a:object_file.o(.section.symbol_name)
That isn't quite what that syntax means. When you see
".section.symbol_name" in the linker script (or in a readelf or
objdump list of sections), that is the whole name of the section, and
you'll only see sections with names like that if you use the
-ffunction-sections option when compiling. Given that your script
works with ld, and if you just use the full filename wild card with
gold, it looks like your musl libraries were indeed compiled with
-ffunction-sections, but that's not something you can always assume is
true for system libraries. So the linker isn't really searching for a
section named ".text" that defines a symbol named "exit" -- instead,
it's simply looking for a section named ".text.exit". Subtle
difference, but you should be aware of it.
Now, while neither of these work with gold:
musl/lib/libc.a:exit.o(.text.exit);
musl/lib/libc.a(.text.exit);
This works:
*(.text.exit);
Is that a missing feature in gold? or am I doing something wrong, maybe there is
another way to refer to a specific symbol of a specific object file in an
archive using gold?
If you look at the resulting -Map output file, I suspect you'll see
the name of the object file is written as "musl/lib/libc.a(exit.o)".
That's the spelling you need to use in the script, and because of the
parentheses, you need to quote it. This:
"musl/lib/libc.a(exit.o)"(.text.exit)
should work. If you want something that will work in both linkers, try
something like this:
"musl/lib/libc.a*exit.o*"(.text.exit)
or just
"*exit.o*"(.text.exit)
Today, while working with one custom library, I found a strange behavior.
A static library code contained a debug main() function. It wasn't inside a #define flag. So it is present in library also. And it is used link to another program which contained the real main().
When both of them are linked together, the linker didn't throw a multiple declaration error for main(). I was wondering how this could happen.
To make it simple, I have created a sample program which simulated the same behavior:
$ cat prog.c
#include <stdio.h>
int main()
{
printf("Main in prog.c\n");
}
$ cat static.c
#include <stdio.h>
int main()
{
printf("Main in static.c\n");
}
$ gcc -c static.c
$ ar rcs libstatic.a static.o
$ gcc prog.c -L. -lstatic -o 2main
$ gcc -L. -lstatic -o 1main
$ ./2main
Main in prog.c
$ ./1main
Main in static.c
How does the "2main" binary find which main to execute?
But compiling both of them together gives a multiple declaration error:
$ gcc prog.c static.o
static.o: In function `main':
static.c:(.text+0x0): multiple definition of `main'
/tmp/ccrFqgkh.o:prog.c:(.text+0x0): first defined here
collect2: ld returned 1 exit status
Can anyone please explain this behavior?
Quoting ld(1):
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive.
When linking 2main, main symbol gets resolved before ld reaches -lstatic, because ld picks it up from prog.o.
When linking 1main, you do have undefined main by the time it gets to -lstatic, so it searches the archive for main.
This logic only applies to archives (static libraries), not regular objects.
When you link prog.o and static.o, all symbols from both objects are included unconditionally, so you get a duplicate definition error.
When you link a static library (.a), the linker only searches the archive if there were any undefined symbols tracked so far. Otherwise, it doesn't look at the archive at all. So your 2main case, it never looks at the archive as it doesn't have any undefined symbols for making the translation unit.
If you include a simple function in static.c:
#include <stdio.h>
void fun()
{
printf("This is fun\n");
}
int main()
{
printf("Main in static.c\n");
}
and call it from prog.c, then linker will be forced to look at the archive to find the symbol fun and you'll get the same multiple main definition error as linker would find the duplicate symbol main now.
When you directly compile the object files(as in gcc a.o b.o), the linker doesn't have any role here and all the symbols are included to make a single binary and obviously duplicate symbols are there.
The bottom line is that linker looks at the archive only if there are missing symbols. Otherwise, it's as good as not linking with any libraries.
After the linker loads any object files, it searches libraries for undefined symbols. If there are none, then no libraries need to be read. Since main has been defined, even if it finds a main in every library, there is no reason to load a second one.
Linkers have dramatically different behaviors, however. For example, if your library included an object file with both main () and foo () in it, and foo was undefined, you would very likely get an error for a multiply defined symbol main ().
Modern (tautological) linkers will omit global symbols from objects that are unreachable - e.g. AIX. Old style linkers like those found on Solaris, and Linux systems still behave like the unix linkers from the 1970s, loading all of the symbols from an object module, reachable or not. This can be a source of horrible bloat as well as excessive link times.
Also characteristic of *nix linkers is that they effectively search a library only once for each time it is listed. This puts a demand on the programmer to order the libraries on the command line to a linker or in a make file, in addition to writing a program. Not requiring an ordered listing of libraries is not modern. Older operating systems often had linkers that would search all libraries repeatedly until a pass failed to resolve a symbol.
I'm experimenting with the concept of pure-static-linked PIE executables on Linux, but running into the problem that the GNU binutils linker insists on adding a PT_INTERP header to the output binary when -pie is used, even when also given -static. Is there any way to inhibit this behavior? That is, is there a way to tell GNU ld specifically not to write certain headers to the output file? Perhaps with a linker script?
(Please don't answer with claims that it won't work; I'm well aware that the program still needs relocation processing - load-address-relative relocations only due to my use of -Bsymbolic - and I have special startup code in place of the standard Scrt1.o to handle this. But I can't get it to be invoked without the dynamic linker already kicking in and doing the work unless hexedit the PT_INTERP header out of the binary.)
Maybe I'm being naïve, but... woudn't suffice to search for the default linker script, edit it, and remove the line that links in the .interp section?
For example, in my machine the scripts are in /usr/lib/ldscripts and the line in question is interp : { *(.interp) } in the SECTIONS section.
You can dumpp the default script used running the following command:
$ ld --verbose ${YOUR_LD_FLAGS} | \
gawk 'BEGIN { s = 0 } { if ($0 ~ /^=/) s = !s; else if (s == 1) print; }'
You can modify the gawk script slightly to remove the interp line (or just use grep -v and use that script to link your program.
I think I might have found a solution: simply using -shared instead of -pie to make pie binaries. You need a few extra linker options to patch up the behavior, but it seems to avoid the need for a custom linker script. Or in other words, the -shared linker script is already essentially correct for linking static pie binaries.
If I get it working with this, I'll update the answer with the exact command line I'm using.
Update: It works! Here's the command line:
gcc -shared -static-libgcc -Wl,-static -Wl,-Bsymbolic \
-nostartfiles -fPIE Zcrt1.s Zcrt2.c /usr/lib/crti.o hello.c /usr/lib/crtn.o
where Zcrt1.s is a modified version of Scrt1.s that calls a function in Zcrt2.c before doing its normal work, and the code in Zcrt2.c processes the aux vector just past the argv and environment arrays to find the DYNAMIC section, then loops over the relocation tables and applies all the relative-type relocations (the only ones that should exist).
Now all of this can (with a little work) be wrapped up into a script or gcc specfile...
Expanding on my earlier note as this doesn't fit in that puny box (and this is just as an idea or discussion, please do not feel obligated to accept or reward bounty), perhaps the easiest and cleanest way of doing this is to juts add a post-build step to strip the PT_INTERP header from the resulting binary?
Even easier than manually editing the headers and potentially having to shift everything around is to just replace PT_INTERP with PT_NULL. I don't know whether you can find a way of simply patching the file via existing tools (some sort of scriptable hex find and replace) or if you'll have to write a small program to do that. I do know that libbfd (the GNU Binary File Descriptor library) might be your friend in the latter case, as it'll make that entire business a lot easier.
I guess I just don't understand why it's important to have this performed via an ld option. If available, I can see why it would be preferable; but as some (admittedly light) Googling indicates there isn't such a feature, it might be less of a hassle to just do it separately and after-the-fact. (Perhaps adding the flag to ld is easier than scripting the replacement of PT_INTERP with PT_NULL, but convincing the devs to pull it upstream is a different matter.)
Apparently (and please correct me if this is something you've already seen) you can override the behavior of ld with regards to any of the ELF headers in your linker script with the PHDRS command, and using :none to specify that a particular header type should not be included in any segment. I'm not certain of the syntax, but I presume it would look something like this:
PHDRS
{
headers PT_PHDR PHDRS ;
interp PT_INTERP ;
text PT_LOAD FILEHDR PHDRS ;
data PT_LOAD ;
dynamic PT_DYNAMIC ;
}
SECTIONS
{
. = SIZEOF_HEADERS;
.interp : { } :none
...
}
From the ld docs you can override the linker script with --library-path:
--library-path=searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. The default set of paths searched (without being
specified with `-L') depends on which emulation mode ld is using, and
in some cases also on how it was configured. See section Environment
Variables. The paths can also be specified in a link script with the
SEARCH_DIR command. Directories specified this way are searched at the
point in which the linker script appears in the command line.
Also, from the section on Implicit Linker Scripts:
If you specify a linker input file which the linker can not recognize
as an object file or an archive file, it will try to read the file as
a linker script. If the file can not be parsed as a linker script, the
linker will report an error.
Which would seem to imply values in user-defined linker scripts, in contrast with implicitly defined linker scripts, will replace values in the default scripts.
I'am not an expert in GNU ld, but I have found the following information in the documentation:
The special secname `/DISCARD/' may be used to discard input sections.
Any sections which are assigned to an output section named `/DISCARD/'
are not included in the final link output.
I hope this will help you.
UPDATE:
(This is the first version of the solution, which don't work because INTERP section is dropped along with the header PT_INTERP.)
main.c:
int main(int argc, char **argv)
{
return 0;
}
main.x:
SECTIONS {
/DISCARD/ : { *(.interp) }
}
build command:
$ gcc -nostdlib -pie -static -Wl,-T,main.x main.c
$ readelf -S a.out | grep .interp
build command without option -Wl,-T,main.x:
$ gcc -nostdlib -pie -static main.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000218
$ readelf -S a.out | grep .interp
[ 1] .interp PROGBITS 00000134 000134 000013 00 A 0 0 1
UPDATE 2:
The idea of this solution is that the original section 'INTERP' (. interp in the linker script file) is renamed to .interp1. In other words, the entire contents of the section is placed to the .interp1 section. Therefore, we can safe remove INTERP section (now empty) without fear of losing default linker script settings and hence the header INTERP_PT will be removed too.
SECTIONS {
.interp1 : { *(.interp); } : NONE
/DISCARD/ : { *(.interp) }
}
In order to show that the contents of the section INTERP present in the file (as .interp1), but INTERP_PT header removed, I use a combination of readelf + grep.
$ gcc -nostdlib -pie -Wl,-T,main.x main.c
$ readelf -l a.out | grep interp
00 .note.gnu.build-id .text .interp1 .dynstr .hash .gnu.hash .dynamic .got.plt
$ readelf -S a.out | grep interp
[ 3] .interp1 PROGBITS 0000002e 00102e 000013 00 A 0 0 1
The option -Wl,--no-dynamic-linker solves the issues with binutils 2.26 or later.