Correct usage of strip tool - linker

Unpacked Linaro GCC 6.2-2016.11 toolchain occupies almost 3.4 GB of disc space and I want to make it smaller. My target is armv7-a+vfpv3+hard_float so I have already removed things I do not need (like ld.gold, libraries for Thumb, v8-a, v7ve etc) but it still occupies almost 1 GB.
So I want to use strip tool to remove redundant information from its binaries.
My main question is how to use strip safety, correctly and efficiently in this case?
In general we have different binary files in toolchain to which we can apply strip: *.exe, *.a, *.o.
As I can judge I can apply strip -s (remove all symbols) only to *.exe files (i.g. arm-eabi-gcc.exe). Am I right?
Is it possible to apply strip to libraries (i.g. libgcc.a)?
As I understood (see example above) symbols in libraries may be needed for further processing.
If yes, should I use --strip-debug instead (remove debugging symbols only)?
Example below illustrates these questions and reveals more.
Assume that we have three files:
// main.c:
#include "libgcc_test.h"
int main(void)
{
do_something();
return 0;
}
// libgcc_test.c:
void do_something(void)
{
return;
}
// libgcc_test.h:
void do_something(void);
In general we just compile each file separately to get object files which can be linked together:
$ ./arm-eabi-gcc.exe main.c -c
$ ./arm-eabi-gcc.exe libgcc_test.c -c
By analyzing object files we can see that do_something symbol is defined in libgcc_test.o and undefined in main.o, as expected:
$ ./arm-eabi-nm.exe main.o
U do_something
00000000 T main
$ ./arm-eabi-nm.exe libgcc_test.o
00000000 T do_something
If we apply strip -s to both files or only to main.o and try to link them, it works:
$ ./arm-eabi-nm.exe main.o
arm-eabi-nm.exe: main.o: no symbols
$ ./arm-eabi-nm.exe libgcc_test.o
arm-eabi-nm.exe: libgcc_test.o: no symbols
$ ./arm-eabi-ld.exe libgcc_test.o main.o -o main
arm-eabi-ld.exe: warning: cannot find entry symbol _start; defaulting to 00008000
But if we apply strip -s only to libgcc_test.o, linker produces error message:
$ ./arm-eabi-strip.exe -s libgcc_test.o
$ ./arm-eabi-ld.exe libgcc_test.o main.o -o main
arm-eabi-ld.exe: warning: cannot find entry symbol _start; defaulting to 00008000
main.o: In function `main':
main.c:(.text+0x8): undefined reference to `do_something'
As I understand presence of unresolved symbol in an object file forces linker to resolve it. What happens if I remove this symbol from an object file before linking?
Is it correct and safe to remove symbols from object files before linking them together? If yes, which symbols can be removed?
In real project if we apply strip -s to toolchain libraries (libgcc.a, libc.a, libm.a, librdimon.a etc) it similarly produces a lot of "undefined reference to..." messages during linking stage.
But if we use --strip-debug option, linker produces messages for libraries like skipping incompatible libgcc.a when searching for -lgcc. If we revert libraries, it is linked successfuly.
What does skipping incompatible... message mean in this case?
Thank you for help.

Just to summarize how I did it. Maybe it will be useful for someone.
I just removed unnecessary libraries/executables and applied strip -s to *.exe files only.
After that all the toolchain appeared to be ~230 MB.

Related

dietlibc, lowfat, opentracker - compiling against alternative libc

I'm attempting to build opentracker. My system has the following:
| package | library | headers |
| lowfat | /usr/lib/libowfat.a | /usr/include/libowfat |
| dietlibc | /opt/diet/lib-x86_64/*.a | /usr/diet/include |
| glibc | /usr/lib/*.{a,so} | /usr/include |
Looking at the Makefile for opentracker, I see (essentially) the following:
PREFIX?=..
LIBOWFAT_HEADERS=$(PREFIX)/libowfat
LIBOWFAT_LIBRARY=$(PREFIX)/libowfat
CFLAGS+=-I$(LIBOWFAT_HEADERS) -Wall -pipe -Wextra
LDFLAGS+=-L$(LIBOWFAT_LIBRARY) -lowfat -pthread -lpthread -lz
opentrackers: $(OBJECTS) $(HEADERS)
cc -o $# $(OBJECTS) $(LDFLAGS)
I've not compiled against an alternative libc before, so I'm including this information in case I've done this part wrong. When I invoke make, I need to point it at where my system has dietlibc and lowfat live. I'm doing it like this:
$ LDFLAGS=-L/opt/diet/lib-x86_64 make PREFIX=/opt/diet LIBOWFAT_HEADERS=/usr/include/libowfat LIBOWFAT_LIBRARY=/usr/lib
...
...
cc -o opentracker opentracker.o trackerlogic.o scan_urlencoded_query.o ot_mutex.o ot_stats.o ot_vector.o ot_clean.o ot_udp.o ot_iovec.o ot_fullscrape.o ot_accesslist.o ot_http.o ot_livesync.o ot_rijndael.o -L/opt/diet/lib-x86_64 -L/usr/lib -lowfat -pthread -lpthread -lz
/usr/bin/ld: /usr/lib/libowfat.a(io_fd.o):(.bss+0xb0): multiple definition of `first_deferred'; /usr/lib/libowfat.a(io_close.o):(.data+0x0): first defined here
...
... lots of warnings ...
/usr/bin/ld: opentracker.o: undefined reference to symbol '__ctype_b_loc##GLIBC_2.3'
/usr/bin/ld: /usr/lib/libc.so.6: error adding symbols: DSO missing from command line
Looks like there's two issues going on in there.
Multiple definitions of first_deferred
I see references to first_deferred in both io_close and io_fd, but they are in different sections.
$ objdump -t /usr/lib/libowfat.a | egrep '^[^:]+.o:|first_deferred' | grep -B1 first_deferred
io_close.o: file format elf64-x86-64
0000000000000000 g O .data 0000000000000008 first_deferred
--
io_fd.o: file format elf64-x86-64
00000000000000b0 g O .bss 0000000000000008 first_deferred
--
io_waituntil2.o: file format elf64-x86-64
0000000000000000 *UND* 0000000000000000 first_deferred
In io/io_fd.c, there's an #include io_internal.h and in that header there's an extern long first_deferred;. In io/io_close.c it's defined as long first_deferred=-1. So it doesn't look like it's double defined in the libowfat code itself. Did I compile lowfat wrong?
DSO missing from command line / symbol '__ctype_b_loc##GLIBC_2.3'
Since the Makefile is trying to compile against dietlibc, I'm a bit surprised that there's a reference to glibc (but, to be honest, also not surprised at all).
Here's the recipe for opentracker.o:
cc -c -o opentracker.o -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -I/usr/include/libowfat -Wall -pipe -Wextra -O3 -DWANT_FULLSCRAPE opentracker.c
This doesn't appear to have the -L/opt/diet/lib-x86_64 argument from LDFLAGS that is used for the main executable. Should it? I don't think so as that's a linker argument so it would not make sense to add it to the compile command. I don't see any references to glibc in the object file:
$ objdump -t ./src/opentracker/opentracker.o | grep -c 'glib'
0
DSO missing from command line / symbol '__ctype_b_loc##GLIBC_2.3'
I found two permutations to solve this issue. Option one is to make sure the very first -L argument is the location of dietlibc's lib directory, so that all symbols are resolved from there first.
The other permutation was to invoke make via the /opt/diet/bin/diet wrapper program. From the dietlibc FAQ
Q: How do I install it? make install?
A: Yep. It will then install itself to /opt/diet, with the wrapper in
/opt/diet/bin/diet. Or you don't install it at all.
The diet libc comes with a wrapper called "diet", which can be found
in bin-$(ARCH)/diet, i.e. bin-i386/diet for most of us. Copy this
wrapper somewhere in your path (for example ~/bin) and then just
compile stuff by prepending diet to the command line, e.g. "diet gcc
-pipe -g -o t t.c".
Q: How do I compile programs using autoconf with the diet libc?
A: Set CC in the environment properly. For Bourne Shells:
$ CC="diet gcc -nostdinc" ./configure --disable-nls
That should be enough, but you might also want to set
--disable-shared and --enable-static for packages using libtool.
It's not explained anywhere on the website, as far as I can tell, what the wrapper program does. The code is annoying to read due to all the architecture specific #ifdefs, but the file comment indicates it just modifies the gcc command line in an architecture specific way. A quick scan suggests relevant args modifications include: -I/opt/diet/include when compiling, -nostdlib when linking, and possibly -Os.
Multiple definitions of first_deferred
I'm not happy with my workaround here. The symbol is defined in io_internal.h:
#ifndef my_extern
#define my_extern extern
#endif
my_extern long first_deferred;
Why is there a funny redefinition of the extern keyword? Read on. The initialization of this variable is in io_close.c:
#include "io_internal.h"
long first_deferred=-1;
And here's the interesting bit. In io_fd.c:
#define my_extern
#include "io_internal.h"
#undef my_extern
Why? Who knows. The author believes they are clever I guess and saved themselves some keystrokes? The effect of this is that my_extern is defined as an empty string, so when my_extern long first_deferred; is transcluded from the header, it appears as long first_deferred;. This is what leads there to be two locations for the symbol in the archive, as there are two files that reserve space for that symbol.
I'm not happy with my "solution", which was to remove the static initialization from io_close.c. Technically, that means the variable starts with random heap memory. A quick look at how it gets used suggests this is maybe not safe, but is probably safe enough. The variable is used as an index into an array. Thankfully iarray_get does a bounds check, so it's very likely that if(e) will be false and the variable will get set to -1 as it should be.
if (first_deferred!=-1) {
while (first_deferred!=-1) {
io_entry* e=iarray_get(&io_fds,first_deferred);
if (e) {
if (e->closed) {
e->closed=0;
close(first_deferred);
}
first_deferred=e->next_defer;
} else
first_deferred=-1; // can't happen
}
}
I can't provide a good explanation for those errors, but your post helped me to get it to compile so I figured I'd mention what I did.
The "first_deferred" error seems to come from using a newer version of libowfat, I got past that by using 0.31 instead.
I didn't come across the second error, but I was getting "__you_tried_to_link_a_dietlibc_object_against_glibc" errors which I got past by uninstalling dietlibc and compiling libowfat with glibc instead.
I compiled them the same way as the AUR packages:
https://aur.archlinux.org/packages/opentracker/
https://aur.archlinux.org/packages/libowfat/
Although, instead of installing libowfat, I just put it in the src directory and skipped fetching libowfat from CVS.

Is there a way to have a linker pull part of an object file from a library for linking?

I have a project with thousands of C files, many libraries, and dozens of programs to link, and to speed up the compilation, I am combining C files into translation units that include multiple C files. This is sometimes referred to as single compilation unit, single translation unit, or unity build.
I have multiple of these translation units compiled into different libraries, and these libs were previously created by compiling each C file individually.
For example:
old library.lib:
file1.o
file2.o
file3.o
file4.o
file5.o
file6.o
new library.lib:
translation_unit_1.o
translation_unit_2.o
translation_unit_1.c:
#include "file1.c"
#include "file2.c"
#include "file3.c"
translation_unit_2.c:
#include "file4.c"
#include "file5.c"
#include "file6.c"
So these compile into: translation_unit_1.o and translation_unit_2.o. And the library is the new library.lib shown above.
Now say I have a program that I want to link to library.lib that refers to a function in file2.c. But has a different version of file1.c that it compiles that duplicates symbols in the file1.c in the library, so it only needs file2.c from the library.lib to link. Or perhaps I have a need to link code from file1.c but can't link file2.c because it has a dependency that I don't want to rely on (example below).
program:
main.o
file1.o
library.lib
Is there a way with any linker that you know of to get the linker to only pull the code from file2.c out of translation_unit_1.o object code and use that to link main.o to make the program?
An alternative would be to split the translation_unit_1.o out into file1.o, file2.o, file3.o if that is possible, then feed that to the linker.
Thanks for any help.
edit 1
This is for single code base that is compiled for both a bare metal ARM platform that uses ELF compiled with ARM ADS 1.2 toolchain and for a Windows platform that uses the Visual Studio toolchain. However thoughts on how to approach the problem on other platforms and toolchains are welcome.
Here is a concrete example on MacOS using clang.
example code below is here: https://github.com/awmorgan/single_translation_unit_lib_link
library:
file1.c this file is needed to link
file2.c this file is not used to link and has an unresolved dependency which could be in another library or object
main.c:
int main( void ) {
extern int file1_a( void );
int x = file1_a();
}
file1.c:
int file1_a(void) {
return 1;
}
file2.c:
int file2_a( void ) {
extern int file3_a( void );
return file3_a(); // file3_a() is located somewhere else
}
single_translation_unit.c:
#include "file1.c"
#include "file2.c"
this works to produce program1.out:
++ clang -c file1.c -o file1.o
++ clang -c file2.c -o file2.o
++ libtool -static file1.o file2.o -o library1.lib
++ clang -c main.c -o main1.o
++ clang main1.o library1.lib -o program1.out
this fails to produce program2.out:
++ clang -c single_translation_unit.c -o single_translation_unit.o
++ libtool -static single_translation_unit.o -o library2.lib
++ clang -c main.c -o main2.o
++ clang main2.o library2.lib -o program2.out
Undefined symbols for architecture x86_64:
"_file3_a", referenced from:
_file2_a in library2.lib(single_translation_unit.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
changing the link order does not work either:
++ clang library2.lib main2.o -o program2.out
Undefined symbols for architecture x86_64:
"_file3_a", referenced from:
_file2_a in library2.lib(single_translation_unit.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Is there a way with clang, gcc, microsoft or any linker
None of clang, gcc or microsoft is a linker (the first two are compilers, and the third is a corporation).
The answer also depends on the platform (which you didn't specify).
IF you are building on a Linux, or another ELF platform, you could compile your code with -ffunction-sections -fdata-sections, and the linker will automagically do what you want.
Is there a way to have a linker pull part of an object file from a library for linking?
In general, linkers operate on sections, and can't split sections apart (you get all or nothing).
Without -ffunction-sections, all functions in a single translation unit end up in a single .text section (this is an approximation -- template instantiations and out-of-line function definitions for inline functions usually end up in a section of their own). Therefore, the linker can't select some, but not all, parts of the .text.
With the GCC/binutils ELF toolchain, or suitably compatible tools, you can do this by:
Compiling single_translation_unit.c with the options -ffunction-sections, -fdata-sections
Linking program2.out with the linker option option -gc-sections.
E.g. (on Linux):
$ gcc -ffunction-sections -fdata-sections -c single_translation_unit.c -o single_translation_unit.o
$ ar rcs library2.a single_translation_unit.o # On Mac OS, use libtool to make the static library if you prefer.
$ gcc -c main.c -o main2.o
$ gcc main2.o library2.a -Wl,-gc-sections -o program2.out
You may replace gcc with clang throughout.
The linkage succeeds because:
In compilation, -ffunction-sections directed the compiler to emit each function definition
in a distinct code section of the object file, containing nothing else, rather than merging them all into
a single .text section, as per default.
In the linkage, -Wl,-gc-sections directed the linker to discard unused sections,
i.e. sections in which no symbols were referenced by the program.
The definition of the unreferenced function file2_a acquired a distinct code section,
containing nothing else, which was therefore unused. The linker was able to discard this unused section, and along with it
the unresolved reference to file3_a within the definition of file2_a.
So no references to file2_a or file3_a were finally linked, as we can see:
$ nm program2.out | egrep '(file2_a|file3_a)'; echo Done
Done
And if we re-do the linkage requesting a mapfile:
$ gcc main2.o library2.a -Wl,-gc-sections,-Map=mapfile -o program2.out
then the mapfile will show us:
...
...
Discarded input sections
...
...
.text.file2_a 0x0000000000000000 0xb library2.a(single_translation_unit.o)
...
...
that the function section text.file2.a originating in library2.a(single_translation_unit.o)
was indeed thrown away.
BTW...
Because of the way a static library is used in linkage,
there is no point in archiving the single object file single_translation_unit.o alone into a static library
library2 and then linking your program against library2, if you know that your program references any
symbol defined in single_translation_unit.o. You might as well skip creating library2 and just link single_translation_unit.o instead.
Given that symbols defined in single_translation_unit.o are needed, the linkage:
$ gcc main2.o library2.a [-Wl,-gc-sections] -o program2.out
is exactly the same linkage as:
$ gcc main2.o single_translation_unit.o [-Wl,-gc-sections] -o program2.out
with or without -Wl,-gc-sections.
And...
I trust you're aware that while a unity build well be fastest for your builds-from-clean,
it may equally well be slow for most incremental builds, as against an automated build system, typically Make based,
that is well-crafted to minimise the amount of rebuilding required per source change. Chances are if you can
benefit from a unity build, it's only from a unity build as well as an efficient incremental build.

Proper way to include C code from directories other than the current directory

I have two directories, sorting and searching (children of the same directory), that have .c source files and .h header files:
mbp:c $ ls sorting
array_tools.c bubble_sort.c insertion_sort.c main selection_sort.c
array_tools.h bubble_sort.h insertion_sort.h main.c selection_sort.h
mbp:c $ ls searching
array_tools.c array_tools.h binary_search.c binary_search.h linear_search.c linear_search.h main main.c
Within searching, I am building an executable that needs to use insertion_sort function, declared in insertion_sort.h and defined in insertion_sort.c inside sorting. The following compilation successfully produces an executable:
mbp:searching $ clang -Wall -pedantic -g -iquote"../sorting" -o main main.c array_tools.c binary_search.c linear_search.c ../sorting/insertion_sort.c
However, I would like to be able to include functions from arbitrary directories by including a header using #include and then providing the compiler with the search path. Do I need to precompile the .c files to .o files beforehand? The man page for clang lists the following option:
-I<directory>
Add the specified directory to the search path for include files.
But the following compilation fails:
mbp:searching $ clang -Wall -pedantic -g -I../sorting -o main main.c array_tools.c binary_search.c linear_search.c
Undefined symbols for architecture x86_64:
"_insertion_sort", referenced from:
_main in main-1a1af0.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
main.c has the following includes:
#include <stdio.h>
#include <stdlib.h>
#include "linear_search.h"
#include "binary_search.h"
#include "array_tools.h"
#include "insertion_sort.h"
I do not understand the link between header files, source files, and object files. To include a function defined in a .c file, is it sufficient to include the homonymous header file, given that the .c file is in the same directory as the header? I have read multiple answers here on SO, the man page for clang and a number of tutorials, but was unable to find a definitive, clear answer.
In response to #spectras:
One by one, you give the compiler a source file to work on. For instance:
cc -Wall -Ipath/to/some/headers foo.c -o foo.o
Running
mbp:sorting $ clang -Wall insertion_sort.c -o insertion_sort.o
produces the following error:
Undefined symbols for architecture x86_64:
"_main", referenced from:
implicit entry/start for main executable
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Okay, it's mixed up a bit. Let's see how one typically compiles a simple multi-file project.
One by one, you give the compiler a source file to work on. For instance:
cc -c -Wall -Ipath/to/some/headers foo.c -o foo.o
The -c flag tells the compiler you want an object file, so it should not run the linker.
The compiler runs the preprocessor on the source file. Among other things, every time it sees a #include directive, it searches the include paths for named file and basically copy-pastes it, replacing the #include with the content. This is done recursively.
This is the step where all .h you include get merged into the source file. We call the whole thing a translation unit.
You can see the result of this step by using -E flag and inspect the result, for instance:
cc -Wall -Ipath/to/some/headers foo.c -E -o foo.test
Let's make this short as other steps are not relevant to your question. The compiler then creates an object file from the resulting source code. The object file contains binary version of all code and data that was in the translation unit, plus metadata that will be used to put everything together and some other stuff (like debugging info).
You can inspect the contents of an object file using objdump -xd foo.o.
Note that as this is done for each source file, this means that headers get parsed and compiled again and again and again. That's the reason they should only declare stuff and not contain actual code: you would end up with that code in every object file.
Once done, you link all the object files into an executable, for instance:
cc foo.o bar.o baz.o -o myprogram
This step will gather all, resolve dependencies and write everything into an executable binary. You may also pull in external object files using -l, like when you do -lrt or -lm.
For instance:
foo.c includes bar.h
bar.h contains a declaration of function do_bar: void do_bar(int);
foo.c can use it, and compiler will generate foo.o correctly
foo.o will have placeholders and the information that it requires do_bar
bar.c defines the implementation of do_bar.
so bar.o will have the information “hey if anyone needs do_bar, I got it here”.
linking step will replace placeholders with actual calls to do_bar.
Finally, when you pass multiple .c files to the compiled like you do in your question, the compiler does basically the same thing, only it won't generate the intermediate object files. Overall process behaves the same though.
So, what about your error?
Undefined symbols for architecture x86_64:
"_insertion_sort", referenced from:
_main in main-1a1af0.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
See? It says linking step failed. That means previous step went well. The #include worked. It's just in the linking step, it's looking for a symbol (data or code) called _insertion_sort, and does not find it. That's because that symbol was declared somewhere (otherwise source using it would not have compiled), but its definition is not available. Either no source file implemented it, or the object file that contains it was not given to the linker.
=> You need to make _insertion_sort's definition available. Either by adding ../sorting/insertion_sort.c to the source lists you pass or by compiling it into an object file and passing that. Or by building it into a library so it can be shared by your two binaries (otherwise they'll each have a copy embedded).
When you get there, usually starting to use a build toolsuite such as CMake is a good idea. It will take care of all the details for you.

Referring to a specific symbol in a static library with the GNU gold linker

When laying out symbols in the address space using a linker script, ld allows to
refer to a specific symbol coming from a static library with the following
syntax:
archive.a:object_file.o(.section.symbol_name)
Using gold rather than ld, it seems that such a directive is ignored. The
linking process succeeds. However, when using this instruction to put a specific
symbol at a specific location with gold and checking the resulting symbol layout
using nm or having a look at the Map file, the symbol is not in the expected
location.
I made a small test case using a dummy hello world program statically compiled
in its entrety with gcc 5.4.0. The C library is musl libc (last commit on the
master branch from the official git repository). For binutils, I also use the
last commit on the master branch from the official git repository.
I use the linker script to place a specific symbol (.text.exit) from a static
library (musl C library: libc.a) at a specific location in the address space
which is: the first position in the .text section.
My linker script is:
ENTRY(_start)
SECTIONS
{
. = 0x10000;
.text :
{
/* Forcing .text.exit in the first position in .text section */
musl/lib/libc.a:exit.o(.text.exit);
*(.text*);
}
. = 0x8000000;
.data : { *(.data*) }
.rodata : { *(.rodata*) }
.bss : { *(.bss*) }
}
My Makefile:
# Set this to 1 to link with gold, 0 to link with ld
GOLD=1
SRC=test.c
OBJ=test.o
LIBS=musl/lib/crt1.o \
musl/lib/libc.a \
musl/lib/crtn.o
CC=gcc
CFLAGS=-nostdinc -I musl/include -I musl/obj/include
BIN=test
LDFLAGS=-static
SCRIPT=linker-script.x
MAP=map
ifeq ($(GOLD), 1)
LD=binutils-gdb/gold/ld-new
else
LD=binutils-gdb/ld/ld-new
endif
all:
$(CC) $(CFLAGS) -c $(SRC) -o $(OBJ)
$(LD) --output $(BIN) $(LDFLAGS) $(OBJ) $(LIBS) -T $(SCRIPT) \
-Map $(MAP)
clean:
rm -rf $(OBJ) $(BIN) $(MAP)
After compiling and linking I'm checking the map file (obtained using the -Map
ld/gold flag) to have a look at the location of .text.exit. Using ld as the
linker, it is indeed in the first position of the text section. Using gold, it
is not (it is present farther in the address space, as if my directive was not
taken into account).
Now, while neither of these work with gold:
musl/lib/libc.a:exit.o(.text.exit);
musl/lib/libc.a(.text.exit)
This works:
*(.text.exit);
Is that a missing feature in gold? or am I doing something wrong, maybe there is
another way to refer to a specific symbol of a specific object file in an
archive using gold?
When laying out symbols in the address space using a linker script, ld allows to
refer to a specific symbol coming from a specific object file inside a static
library with the following syntax:
archive.a:object_file.o(.section.symbol_name)
That isn't quite what that syntax means. When you see
".section.symbol_name" in the linker script (or in a readelf or
objdump list of sections), that is the whole name of the section, and
you'll only see sections with names like that if you use the
-ffunction-sections option when compiling. Given that your script
works with ld, and if you just use the full filename wild card with
gold, it looks like your musl libraries were indeed compiled with
-ffunction-sections, but that's not something you can always assume is
true for system libraries. So the linker isn't really searching for a
section named ".text" that defines a symbol named "exit" -- instead,
it's simply looking for a section named ".text.exit". Subtle
difference, but you should be aware of it.
Now, while neither of these work with gold:
musl/lib/libc.a:exit.o(.text.exit);
musl/lib/libc.a(.text.exit);
This works:
*(.text.exit);
Is that a missing feature in gold? or am I doing something wrong, maybe there is
another way to refer to a specific symbol of a specific object file in an
archive using gold?
If you look at the resulting -Map output file, I suspect you'll see
the name of the object file is written as "musl/lib/libc.a(exit.o)".
That's the spelling you need to use in the script, and because of the
parentheses, you need to quote it. This:
"musl/lib/libc.a(exit.o)"(.text.exit)
should work. If you want something that will work in both linkers, try
something like this:
"musl/lib/libc.a*exit.o*"(.text.exit)
or just
"*exit.o*"(.text.exit)

gcc detect duplicate symbols/functions in static libraries

Is there any way we can get gcc to detect a duplicate symbol in static libraries vs the main code (Or another static library ?)
Here's the situation:
main.c erroneously contained a function definition, e.g. with the signature uint foohash(const char*)
foo.c also contains a function definition with the signature uint foohash(const char*)
foo.c and other source files are compiled to a static util library, which the main program links in, i.e. something like:
gcc -o main main.o util.o -L ./libs -lfooutils
So, now main.o and libs/libfooutils.a both contain a foohash function. Presumably the linker found that symbol in main.o and doesn't bother looking for it elsewhere.
Is there any way we can get gcc to detect such a situation ?
Indeed as Simon Richter stated, --whole-archive option can be useful. Try to change your command-line to:
gcc -o main main.o util.o -L ./libs -Wl,--whole-archive -lfooutils -Wl,--no-whole-archive
and you'll see a multiple definition error.
gcc calls the ld program for linking. The relevant ld options are:
--no-define-common
--traditional-format
--warn-common
See the man page for ld. These should be what you need to experiment with to get the warnings sought.
Short answer: no.
GCC does not actually do anything with libraries. It is the task of ld, the linker (called automatically by GCC) to pull in symbols from libraries, and that's really a fairly dumb tool.
The linker has lots of complex jiggery pokery for combining different types of data from different sources, and supporting different file formats, and all the evil little details of binary executables, but in the end, all it really does is look for undefined symbols and find the definitions.
What you can do is a link trace (pass -t to gcc) to see what comes from where. Or else run nm on all the object files and libraries in your system, and write a script to detect duplicates.

Resources