Can I make some symbols only visible to other library members? - c

Consider 3 C source files:
/* widgets.c */
void widgetTwiddle ( struct widget * w ) {
utilityTwiddle(&w->bits, 1);
}
and
/* wombats.c */
void wombatTwiddle ( struct wombat * w ) {
utilityTwiddle(&w->bits, 1);
}
and
/* utility.c */
void utilityTwiddle ( int * bitsPtr, int bits ) {
*bitsPtr ^= bits;
}
which get compiled and put in a library (say, either libww.a or libww.so).
Is there a way to make utilityTwiddle() visible and usable by the other two library members, but not be visible to to those who link to the library? That is, given this:
/* appl.c */
extern void utilityTwiddle ( int * bitsPtr, int bits );
int main ( void ) {
int bits;
utilityTwiddle(&bits, 1);
return 0;
}
and
cc -o appl appl.c -lww
it would fail to link because utilityTwiddle() is not visible to appl.c. And, consequently appl.c would be free to define its own utilityTwiddle function or variable.
[EDIT] And hopefully obviously, we would like this to work:
/* workingappl.c */
extern void wombatTwiddle ( struct wombat * wPtr );
int main ( void ) {
struct wombat w = { .bits = 0 };
wombatTwiddle(&w);
return 0;
}
This Limiting visibility of symbols when linking shared libraries seems related, but it doesn't seem to address whether the symbols suppressed are available to other library members.
[EDIT2] I have sort-of figured out a way to do it without modifying the C source. Add a map file:
/* utility.map */
{ local: *; };
and then do:
$ gcc -shared -o utility.so utility.c -fPIC -Wl,--version-script=utility.map
gives us a dynamic symbol table w/o utilityTwiddle:
$ nm -D utility.so
w _Jv_RegisterClasses
w __cxa_finalize
w __gmon_start__
but it's not clear to me how to effectively go from this to building a shared library with all three source files. If I put all three source files on the command line, the symbols from all three are hidden. If there is a way to incrementally build the shared library, I could have two simple map files (one to export nothing, one to export everything). Is this doable or is the only option something like this:
/* libww.map */
{ global: list; of; all; symbols; to; export; local: *; };
and
$ gcc -shared -o libww.so *.c -fPIC -Wl,--version-script=libww.map
[EDIT3]
Boy, it sure seems like this also ought to be possible without using shared libraries. If I do:
ld -r -o wboth.o widgets.o wombats.o utility.o
I can see that the linker has resolved to location of utilityTwiddle() where widgetTwiddle() and wombatTwiddle() call it:
$ objdump -d wboth.o
0000000000000000 <widgetTwiddle>:
0: be 01 00 00 00 mov $0x1,%esi
5: e9 00 00 00 00 jmpq a <widgetTwiddle+0xa>
0000000000000010 <wombatTwiddle>:
10: be 01 00 00 00 mov $0x1,%esi
15: e9 00 00 00 00 jmpq 1a <wombatTwiddle+0xa>
0000000000000020 <utilityTwiddle>:
20: 31 37 xor %esi,(%rdi)
22: c3 retq
but utilityTwiddle remains as a symbol:
$ nm wboth.o
U _GLOBAL_OFFSET_TABLE_
0000000000000020 T utilityTwiddle
0000000000000000 T widgetTwiddle
0000000000000010 T wombatTwiddle
and so if you could find a way to remove that symbol, you could still successfully link against wboth.o (I have tested this by binary editing wboth.o) and it still links and runs fine:
$ nm wboth.o
U _GLOBAL_OFFSET_TABLE_
0000000000000000 T widgetTwiddle
0000000000000010 T wombatTwiddle
0000000000000020 T xtilityTwiddle

You can't achieve what you want by creating a static library libww.a. If you
read static-libraries you
will see why. A static library can be used to offer a bunch N of object files
to the linker, from which it will extract k (possibly = 0) that it needs and link them. So you
can't achieve anything by linking with the static library that you can't achieve by
linking those k object files directly. For linkage purposes, static libraries don't really
exist.
But shared libraries really do exist for linkage purposes and the global symbols exposed by shared library
acquire an additional property, dynamic visibility, that exists precisely for your
purpose. The dynamically visible symbols are a subset of the global symbols: they are
the global symbols that are visible for dynamic linkage, i.e. for linking the shared library
with something else (a program or another shared library).
Dynamic visibility is not an attribute that source language standards say anything
about, because they don't say anything about dynamic linkage. So controlling the
dynamic visibility of symbols has to be done in an individual way by a toolchain that
does support dynamic linkage. GCC does it with the compiler-specific declaration
qualifier1:
__attribute__((visibility("default|hidden|protected|internal")
and/or the compiler switch2:
-fvisibility=default|hidden|protected|internal
Here's a demo of how build libww.so so that utilityTwiddle is hidden from
clients of the library while wombatTwiddle and widgetTwiddle are visible.
Your source code needs fleshed out a bit in one way or another to compile.
Here's a first cut:
ww.h (1)
#ifndef WW_H
#define WW_H
struct widget {
int bits;
};
struct wombat {
int bits;
};
extern void widgetTwiddle ( struct widget * w );
extern void wombatTwiddle ( struct wombat * w );
#endif
utility.h (1)
#ifndef UTILITY_H
#define UTILITY_H
extern void utilityTwiddle ( int * bitsPtr, int bits );
#endif
utility.c
#include "utility.h"
void utilityTwiddle ( int * bitsPtr, int bits ) {
*bitsPtr ^= bits;
}
wombats.c
#include "utility.h"
#include "ww.h"
void wombatTwiddle ( struct wombat * w ) {
utilityTwiddle(&w->bits, 1);
}
widgets.c
#include "utility.h"
#include "ww.h"
void widgetTwiddle ( struct widget * w ) {
utilityTwiddle(&w->bits, 1);
}
Compile all the *.c files to *.o files in the default manner:
$ gcc -Wall -Wextra -c widgets.c wombats.c utility.c
and link them into libww.so in the default manner:
$ gcc -shared -o libww.so widgets.o wombats.o utility.o
Here are *Twiddle symbols in the global symbol table of libww.so
$ nm libww.so | egrep '*Twiddle'
000000000000063a T utilityTwiddle
00000000000005fa T widgetTwiddle
000000000000061a T wombatTwiddle
This is just the sum of the global (extern) *Twiddle symbols that went into the linkage
of libww.so from the object files. They're all defined (T), as they'd have to be
if the library itself was to be linked without external *Twiddle dependencies.
Any ELF file (object file, shared library, program) has a global symbol table, but
a shared library also has a dynamic symbol table. Here are the *Twiddle symbols in the dynamic symbol table of libww.so:
$ nm -D libww.so | egrep '*Twiddle'
000000000000063a T utilityTwiddle
00000000000005fa T widgetTwiddle
000000000000061a T wombatTwiddle
They're exactly the same. That's what we want to change, so that utilityTwiddle
disappears.
Here's a second cut. We have to change the source code slightly.
utility.h (2)
#ifndef UTILITY_H
#define UTILITY_H
extern void utilityTwiddle ( int * bitsPtr, int bits ) __attribute__((visibility("hidden")));
#endif
Then recompile and relink, just as before:
$ gcc -Wall -Wextra -c widgets.c wombats.c utility.c
$ gcc -shared -o libww.so widgets.o wombats.o utility.o
Here are the *Twiddle symbols now in the global symbol table:
$ nm libww.so | egrep '*Twiddle'
000000000000063a T utilityTwiddle
00000000000005fa T widgetTwiddle
000000000000061a T wombatTwiddle
No change there. And here are the *Twiddle symbols now in the dynamic symbol table:
$ nm -D libww.so | egrep '*Twiddle'
00000000000005aa T widgetTwiddle
00000000000005ca T wombatTwiddle
utilityTwiddle is gone.
Here's a third cut that achieves the same result differently. It's more long-winded
but illustrates how the -fvisibility compiler option plays. This time,
utility.h is again as per (1), but ww.h is:
ww.h (2)
#ifndef WW_H
#define WW_H
struct widget {
int bits;
};
struct wombat {
int bits;
};
extern void widgetTwiddle ( struct widget * w ) __attribute__((visibility("default")));
extern void wombatTwiddle ( struct wombat * w ) __attribute__((visibility("default")));
#endif
Now we recompile like so:
$ gcc -Wall -Wextra -fvisibility=hidden -c widgets.c wombats.c utility.c
We're telling the compiler to annotate every global symbol it generates with
__attribute__((visibility("hidden"))) unless there is a countervailing
__attribute__((visibility("..."))) explicitly in the source code.
Then relink the shared library just as previously. Again we see in the global symbol table:
$ nm libww.so | egrep '*Twiddle'
00000000000005ea t utilityTwiddle
00000000000005aa T widgetTwiddle
00000000000005ca T wombatTwiddle
and in the dynamic symbol table:
$ nm -D libww.so | egrep '*Twiddle'
00000000000005aa T widgetTwiddle
00000000000005ca T wombatTwiddle
Finally, to show that removing utilityTwiddle from the dynamic symbol table
of libww.so in one of these ways really does hide it from clients linking with
libww.so. Here's a program that wants to call all the *Twiddles:
prog.c
#include <ww.h>
extern void utilityTwiddle ( int * bitsPtr, int bits );
int main()
{
struct widget wi = {1};
struct wombat wo = {2};
widgetTwiddle(&wi);
wombatTwiddle(&wo);
utilityTwiddle(&wi.bits,wi.bits);
return 0;
}
We have no problem building it like:
$ gcc -Wall -Wextra -I. -c prog.c
$ gcc -o prog prog.o utility.o widgets.o wombats.o
But nobody can build it like:
$ gcc -Wall -Wextra -I. -c prog.c
$ gcc -o prog prog.o -L. -lww
prog.o: In function `main':
prog.c:(.text+0x4a): undefined reference to `utilityTwiddle'
collect2: error: ld returned 1 exit status
Be clear that -fvisibility is a compilation option, not a linkage option.
You pass it to your compilation commands and not to your linkage commands,
because it's effect is the same as sprinkling __attribute__((visibility("...")))
qualifiers over the declarations in your source code, which the compiler has
to honour by injecting linkage information into the object files that it generates. If
you care to see the evidence of that you can just repeat that last compilation
and request that the assembly files be saved:
$ gcc -Wall -Wextra -fvisibility=hidden -c widgets.c wombats.c utility.c -save-temps
Then compare say:
widgets.s
.file "widgets.c"
.text
.globl widgetTwiddle
.type widgetTwiddle, #function
widgetTwiddle:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movl $1, %esi
movq %rax, %rdi
call utilityTwiddle#PLT
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size widgetTwiddle, .-widgetTwiddle
.ident "GCC: (Ubuntu 7.3.0-16ubuntu3) 7.3.0"
.section .note.GNU-stack,"",#progbits
with:
utility.s
.file "utility.c"
.text
.globl utilityTwiddle
.hidden utilityTwiddle
^^^^^^^^^^^^^^^^^^^^^^
.type utilityTwiddle, #function
utilityTwiddle:
...
...
[1] See the GCC manual:
6.31.1 Common Function Attributes
6.32.1 Common Variable Attributes
[2] See the GCC Manual, 3.16 Options for Code Generation Conventions.

Related

How does GCC implement __attribute__((constructor)) on MinGW?

I know that on ELF platforms, __attribute__((constructor)) uses the .ctors ELF section. Now I realized that the function attribute works with GCC on MinGW as well and I'm wondering how it is implemented.
For MinGW targets (and other COFF targets, like Cygwin) compiler just emits each constructor function address in .ctors COFF section:
$ cat c1.c
void c1() {
}
$ x86_64-w64-mingw32-gcc -c c1.c
$ objdump -x c1.o | grep ctors
# nothing
$ cat c1.c
__attribute__((constructor)) void c1() {
}
$ x86_64-w64-mingw32-gcc -c c1.c
$ objdump -x c1.o | grep ctors
5 .ctors 00000008 0000000000000000 0000000000000000 00000150 2**3
GNU ld linker (for MinGW targets) is then configured (via its default linker script) to combine these sections into regular .text section with __CTOR_LIST__ symbol pointing to the first item, and having the last item terminated with zero. (Probably .rdata section would be clearer since these are just addresses of functions, not CPU instructions, but for some reason .text is used. In fact LLVM LLD linker targeting MinGW places them in .rdata.)
LD linker:
$ x86_64-w64-mingw32-ld --verbose
...
.text ... {
...
__CTOR_LIST__ = .;
LONG (-1); LONG (-1);
KEEP (*(.ctors));
KEEP (*(.ctor));
KEEP (*(SORT_BY_NAME(.ctors.*)));
LONG (0); LONG (0);
...
...
}
Then it is up to C runtime library to run these constructors during initialization, by using this __CTOR_LIST__ symbol.
From mingw-w64 C runtime:
extern func_ptr __CTOR_LIST__[];
void __do_global_ctors (void)
{
// finds the last (zero terminated) item
...
// then runs from last to first:
for (i = nptrs; i >= 1; i--)
{
__CTOR_LIST__[i] ();
}
...
}
(also, it is very similar in Cygwin runtime)
This can be also seen in the debugger:
$ echo $MSYSTEM
MINGW64
$ cat c11.c
#include <stdio.h>
__attribute__((constructor))
void i1() {
puts("i 1");
}
int main() {
puts("main");
return 0;
}
$ gcc c11.c -o c11
$ gdb ./c11.exe
(gdb) b i1
(gdb) r
(gdb) bt
#0 0x00007ff603591548 in i1 ()
#1 0x00007ff6035915f2 in __do_global_ctors () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:44
#2 0x00007ff60359164f in __main () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:58
#3 0x00007ff60359139b in __tmainCRTStartup () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:313
#4 0x00007ff6035914f6 in mainCRTStartup () at C:/_/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:202
(gdb)
Note that in some environments (not MinGW and not Linux) it is instead the responsibility of GCC (its compiler runtime libgcc, more specifically its static part called crtbegin.o and crtend.o) and not C runtime to run these constructors.
Also, for comparison, on ELF targets (like Linux) GCC compiler used similar mechanism like the one described above for MinGW (it used ELF .ctors sections, although the rest was a bit different), but since GCC 4.7 (released in 2012) it uses slightly different mechanism (ELF .init_array section).

Is -ffunction-sections -fdata-sections and --gc-sections not working?

I want to remove unused functions from code while compiling. Then I write some code (main.c):
#include <stdio.h>
const char *get1();
int main()
{
puts( get1() );
}
and getall.c:
const char *get1()
{
return "s97symmqdn-1";
}
const char *get2()
{
return "s97symmqdn-2";
}
const char *get3()
{
return "s97symmqdn-3";
}
Makefile
test1 :
rm -f a.out *.o *.a
gcc -ffunction-sections -fdata-sections -c main.c getall.c
ar cr libgetall.a getall.o
gcc -Wl,--gc-sections main.o -L. -lgetall
After run make test1 && objdump --sym a.out | grep get , I only find the next 2 lines output:
0000000000000000 l df *ABS* 0000000000000000 getall.c
0000000000400535 g F .text 000000000000000b get1
I guess the get2 and get3 was removed. But when I open the a.out by vim, I found s97symmqdn-1 s97symmqdn-2 s97symmqdn-3 exists.
Is the function get2 get3 removed really ? How I can remove the symbol s97symmqdn-2 s97symmqdn-3 ? Thank you for your reply.
My system is centos7 and gcc version is 4.8.5
The compilation options -ffunction-sections -fdata-sections and linkage option --gc-sections
are working correctly in your example. Your static library is superfluous, so it can
be simplified to:
$ gcc -ffunction-sections -fdata-sections -c main.c getall.c
$ gcc -Wl,--gc-sections main.o getall.o -Wl,-Map=mapfile
in which I'm also asking for the linker's mapfile.
The unused functions get2 and get3 are absent from the executable:
$ nm a.out | grep get
0000000000000657 T get1
and the mapfile shows that the unused function-sections .text.get2 and .text.get3 in which get2 and get3 are
respectively defined were discarded in the linkage:
mapfile (1)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
...
Nevertheless, as you found, all three of the string literals "s97symmqdn-(1|2|3)"
are in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
s97symmqdn-2
s97symmqdn-3
That is because -fdata-sections applies just to the same data objects that
__attribute__ ((__section__("name"))) applies to1, i.e. to the definitions
of variables that have static storage duration. It is not applied to anonymous string literals like your
"s97symmqdn-(1|2|3)". They are all just placed in the .rodata section as usual,
and there we find them:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06ed 73393773 796d6d71 646e2d31 00733937 s97symmqdn-1.s97
06fd 73796d6d 71646e2d 32007339 3773796d symmqdn-2.s97sym
070d 6d71646e 2d3300 mqdn-3.
--gc-sections does not allow the linker to discard .rodata from the program
because it is not an unused section: it contains "s97symmqdn-1", referenced
in the program by get1 as well as the unreferenced strings "s97symmqdn-2"
and "s97symmqdn-3"
Fix
To get these three string literals separated into distinct data sections, you
need to assign them to distinct named objects, e.g.
getcall.c (2)
const char *get1()
{
static const char s[] = "s97symmqdn-1";
return s;
}
const char *get2()
{
static const char s[] = "s97symmqdn-2";
return s;
}
const char *get3()
{
static const char s[] = "s97symmqdn-3";
return s;
}
If we recompile and relink with that change, we see:
mapfile (2)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
.rodata.s.1797
0x0000000000000000 0xd getall.o
.rodata.s.1800
0x0000000000000000 0xd getall.o
...
Now there are two new discarded data-sections, which contain
the two string literals we don't need, as we can see in the object file:
$ objdump -s -j .rodata.s.1797 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1797:
0000 73393773 796d6d71 646e2d32 00 s97symmqdn-2.
and:
$ objdump -s -j .rodata.s.1800 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1800:
0000 73393773 796d6d71 646e2d33 00 s97symmqdn-3.
Only the referenced string "s97symmqdn-1" now appears anywhere in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
and it is the only string in the program's .rodata:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06f0 73393773 796d6d71 646e2d31 00 s97symmqdn-1.
[1] Likewise, -function-sections has the same effect as qualifying the
definition of every function foo with __attribute__ ((__section__(".text.foo")))

What exactly does `-rdynamic` do and when exactly is it needed?

What exactly does -rdynamic (or --export-dynamic at the linker level) do and how does it relate to symbol visibility as defined by the -fvisibility* flags or visibility pragmas and __attribute__s?
For --export-dynamic, ld(1) mentions:
...
If you use "dlopen" to load a dynamic object which needs to refer back
to the symbols defined by the program, rather than some other dynamic
object, then you will probably need
to use this option when linking the program itself. ...
I'm not sure I completely understand this. Could you please provide an example that doesn't work without -rdynamic but does with it?
Edit:
I actually tried compiling a couple of dummy libraries (single file, multi-file, various -O levels, some inter-function calls, some hidden symbols, some visible), with and without -rdynamic, and so far I've been getting byte-identical outputs (when keeping all other flags constant of course), which is quite puzzling.
Here is a simple example project to illustrate the use of -rdynamic.
bar.c
extern void foo(void);
void bar(void)
{
foo();
}
main.c
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
void foo(void)
{
puts("Hello world");
}
int main(void)
{
void * dlh = dlopen("./libbar.so", RTLD_NOW);
if (!dlh) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
void (*bar)(void) = dlsym(dlh,"bar");
if (!bar) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
bar();
return 0;
}
Makefile
.PHONY: all clean test
LDEXTRAFLAGS ?=
all: prog
bar.o: bar.c
gcc -c -Wall -fpic -o $# $<
libbar.so: bar.o
gcc -shared -o $# $<
main.o: main.c
gcc -c -Wall -o $# $<
prog: main.o | libbar.so
gcc $(LDEXTRAFLAGS) -o $# $< -L. -lbar -ldl
clean:
rm -f *.o *.so prog
test: prog
./$<
Here, bar.c becomes a shared library libbar.so and main.c becomes
a program that dlopens libbar and calls bar() from that library.
bar() calls foo(), which is external in bar.c and defined in main.c.
So, without -rdynamic:
$ make test
gcc -c -Wall -o main.o main.c
gcc -c -Wall -fpic -o bar.o bar.c
gcc -shared -o libbar.so bar.o
gcc -o prog main.o -L. -lbar -ldl
./prog
./libbar.so: undefined symbol: foo
Makefile:23: recipe for target 'test' failed
And with -rdynamic:
$ make clean
rm -f *.o *.so prog
$ make test LDEXTRAFLAGS=-rdynamic
gcc -c -Wall -o main.o main.c
gcc -c -Wall -fpic -o bar.o bar.c
gcc -shared -o libbar.so bar.o
gcc -rdynamic -o prog main.o -L. -lbar -ldl
./prog
Hello world
-rdynamic exports the symbols of an executable, this mainly addresses scenarios as described in Mike Kinghan's answer, but also it helps e.g. Glibc's backtrace_symbols() symbolizing the backtrace.
Here is a small experiment (test program copied from here)
#include <execinfo.h>
#include <stdio.h>
#include <stdlib.h>
/* Obtain a backtrace and print it to stdout. */
void
print_trace (void)
{
void *array[10];
size_t size;
char **strings;
size_t i;
size = backtrace (array, 10);
strings = backtrace_symbols (array, size);
printf ("Obtained %zd stack frames.\n", size);
for (i = 0; i < size; i++)
printf ("%s\n", strings[i]);
free (strings);
}
/* A dummy function to make the backtrace more interesting. */
void
dummy_function (void)
{
print_trace ();
}
int
main (void)
{
dummy_function ();
return 0;
}
compile the program: gcc main.c and run it, the output:
Obtained 5 stack frames.
./a.out() [0x4006ca]
./a.out() [0x400761]
./a.out() [0x40076d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f026597f830]
./a.out() [0x4005f9]
Now, compile with -rdynamic, i.e. gcc -rdynamic main.c, and run again:
Obtained 5 stack frames.
./a.out(print_trace+0x28) [0x40094a]
./a.out(dummy_function+0x9) [0x4009e1]
./a.out(main+0x9) [0x4009ed]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f85b23f2830]
./a.out(_start+0x29) [0x400879]
As you can see, we get a proper stack trace now!
Now, if we investigate ELF's symbol table entry (readelf --dyn-syms a.out):
without -rdynamic
Symbol table '.dynsym' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free#GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts#GLIBC_2.2.5 (2)
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND backtrace_symbols#GLIBC_2.2.5 (2)
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND backtrace#GLIBC_2.2.5 (2)
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __stack_chk_fail#GLIBC_2.4 (3)
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
8: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
with -rdynamic, we have more symbols, including the executable's:
Symbol table '.dynsym' contains 25 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free#GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts#GLIBC_2.2.5 (2)
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND backtrace_symbols#GLIBC_2.2.5 (2)
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND backtrace#GLIBC_2.2.5 (2)
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __stack_chk_fail#GLIBC_2.4 (3)
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
9: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
10: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
11: 0000000000601060 0 NOTYPE GLOBAL DEFAULT 24 _edata
12: 0000000000601050 0 NOTYPE GLOBAL DEFAULT 24 __data_start
13: 0000000000601068 0 NOTYPE GLOBAL DEFAULT 25 _end
14: 00000000004009d8 12 FUNC GLOBAL DEFAULT 14 dummy_function
15: 0000000000601050 0 NOTYPE WEAK DEFAULT 24 data_start
16: 0000000000400a80 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used
17: 0000000000400a00 101 FUNC GLOBAL DEFAULT 14 __libc_csu_init
18: 0000000000400850 42 FUNC GLOBAL DEFAULT 14 _start
19: 0000000000601060 0 NOTYPE GLOBAL DEFAULT 25 __bss_start
20: 00000000004009e4 16 FUNC GLOBAL DEFAULT 14 main
21: 00000000004007a0 0 FUNC GLOBAL DEFAULT 11 _init
22: 0000000000400a70 2 FUNC GLOBAL DEFAULT 14 __libc_csu_fini
23: 0000000000400a74 0 FUNC GLOBAL DEFAULT 15 _fini
24: 0000000000400922 182 FUNC GLOBAL DEFAULT 14 print_trace
I hope that helps!
I use rdynamic to print out backtraces using the backtrace()/backtrace_symbols() of Glibc.
Without -rdynamic, you cannot get function names.
To know more about the backtrace() read it over here.
From The Linux Programming Interface:
42.1.6
Accessing Symbols in the Main Program
Suppose that we use dlopen() to dynamically load a shared library,
use dlsym() to obtain the address of a function x() from that
library, and then call x(). If x() in turn calls a function y(),
then y() would normally be sought in one of the shared libraries
loaded by the program.
Sometimes, it is desirable instead to have x() invoke an
implementation of y() in the main program. (This is similar to a
callback mechanism.) In order to do this, we must make the
(global-scope) symbols in the main program available to the dynamic
linker, by linking the program using the --export-dynamic linker
option:
$ gcc -Wl,--export-dynamic main.c (plus further options and
arguments)
Equivalently, we can write the following:
$ gcc -export-dynamic main.c
Using either of these options allows a dynamically loaded library to
access global symbols in the main program.
The gcc -rdynamic option and the gcc -Wl,-E option are further
synonyms for -Wl,--export-dynamic.
I guess this only works for dynamically loaded shared library, opened with dlopen(). Correct me if I am wrong.

ld makes all my functions link to the last one in header file

I've started working on a home-brew OS for learning purposes. So it works like this :
Once the kernel is loaded I create a stack and call my kmain()
In kmain I try calling function foo() defined in header.h
//Header.h
#ifndef INCLUDE_HEADER_H
#define INCLUDE_HEADER_H
int foo(char* buf);
int bar();
#endif
Using nm on my kernel I can clearly see that foo() is in the binary but when I disassemble kmain with gdb I see that foo isn't called, instead bar is.
This problem is recurrent on all headers containing multiple functions.
I compile on windows 10 in a Cygwin environment. I use the following arguments passed to nasm/gcc/ld in my makefile
CC = gcc
CFLAGS = -m32 -nostdlib -nostdinc \
-nostartfiles -fno-leading-underscore -nodefaultlibs\
-Wall -Wextra -Wno-unused-variable -Wno-unused-function\
-c
LD = i686-elf-ld
LDFLAGS = -Tlink.ld -melf_i386
AS = nasm
ASFLAGS = -f elf
Any ideas why ?
EDIT :
//screen.h
#ifndef SCREEN_H
#define SCREEN_H
int test();
void print(char c);
#endif
And
//kmain.c
#include "screen.h"
int kmain(){
int b = test();
print('A');
return 0xcafebabe;
}
nm kernel.elf
$ nm kernel.elf
e4524ffe a CHECKSUM
00000000 a FLAGS
0010011c b kernel_stack
00004000 a KERNEL_STACK_SIZE
00100000 T kmain
001000c8 T loader
001000dd t loader.loop
1badb002 a MAGIC_NUMBER
001000b0 T outb
00100072 T print
0010002c T strlen
00100068 T test
0010005c T testFunc
gdb disassembly of kmain:
(gdb) disassemble kmain
Dump of assembler code for function kmain:
0x00100000 <kmain+0>: push %ebp
0x00100001 <kmain+1>: mov %esp,%ebp
0x00100003 <kmain+3>: sub $0x28,%esp
0x00100006 <kmain+6>: call 0x10006b <print+1> ;should call test but calls print instead
0x0010000b <kmain+11>: mov %eax,-0xc(%ebp)
0x0010000e <kmain+14>: movl $0x41,(%esp) ;pushes 'A'
0x00100015 <kmain+21>: call 0x100084 <print+26> ;calls print('A')
0x0010001a <kmain+26>: mov $0xcafebabe,%eax
0x0010001f <kmain+31>: leave
0x00100020 <kmain+32>: ret
0x00100021 <kmain+33>: nop
0x00100022 <kmain+34>: nop
0x00100023 <kmain+35>: nop
End of assembler dump.
0x00100006 <kmain+6>: call 0x10006b <print+1> ;should call test but calls print instead
<print+1> is just the label. This instruction does call the test function as can be seen from the address 0x10006b :
00100068 T test
00100072 T print
It'll be clearer if you look at the disassembly of the compiled "screen.c".
I found that the problem was in the compiler tool-chain I was using. It's what created the weird linking problem.
Here are the instructions I followed to compile a clean new Binutils + Gcc and it's working now !

Is it possible to override static functions in an object module (gcc, ld, x86, objcopy)?

Is there a way to override functions with static scope
within an object module?
If I start with something like this, a module
with global symbol "foo" is a function that calls
local symbol "bar," that calls local symbol "baz"
[scameron#localhost ~]$ cat foo.c
#include <stdio.h>
static void baz(void)
{
printf("baz\n");
}
static void bar(void)
{
printf("bar\n");
baz();
}
void foo(void)
{
printf("foo\n");
bar();
}
[scameron#localhost ~]$ gcc -g -c foo.c
[scameron#localhost ~]$ objdump -x foo.o | egrep 'foo|bar|baz'
foo.o: file format elf32-i386
foo.o
00000000 l df *ABS* 00000000 foo.c
00000000 l F .text 00000014 baz
00000014 l F .text 00000019 bar
0000002d g F .text 00000019 foo
It has one global, "foo" and two locals "bar" and "baz."
Suppose I want to write some unit tests that exercise bar and baz,
I can do:
[scameron#localhost ~]$ cat barbaz
bar
baz
[scameron#localhost ~]$ objcopy --globalize-symbols=barbaz foo.o foo2.o
[scameron#localhost ~]$ objdump -x foo2.o | egrep 'foo|bar|baz'
foo2.o: file format elf32-i386
foo2.o
00000000 l df *ABS* 00000000 foo.c
00000000 g F .text 00000014 baz
00000014 g F .text 00000019 bar
0000002d g F .text 00000019 foo
[scameron#localhost ~]$
And now bar and baz are global symbols and accessible from
outside the module. So far so good.
But what if I want to interpose my own function on top
of "baz", and have "bar" call my interposed "baz"?
Is there a way to do that?
--wrap option doesn't seem to do it...
[scameron#localhost ~]$ cat ibaz.c
#include <stdio.h>
extern void foo();
extern void bar();
void __wrap_baz()
{
printf("wrapped baz\n");
}
int main(int argc, char *argv[])
{
foo();
baz();
}
[scameron#localhost ~]$ gcc -o ibaz ibaz.c foo2.o -Xlinker --wrap -Xlinker baz
[scameron#localhost ~]$ ./ibaz
foo
bar
baz
wrapped baz
[scameron#localhost ~]$
The baz called from main() got wrapped, but
bar still calls the local baz not the wrapped baz.
Is there a way to make bar call the wrapped baz?
Even if it requires modifying the object code to tinker with the addresses of function calls, if that can be done in an automated way, that might be good enough, but in that case it needs to work on at least i386 and x86_64.
-- steve
Since static is a promise to the C compiler that the function or variable is local to the file, the compiler is free to remove that code if it can get the same result without it.
This might be inlining the function call. It might mean replacing a variable with constants. If the code is inside an if statement that is always false, the function may not even exist in the compiled result.
All of that means that you cannot reliably redirect calls to that function.
If you compile with the new -lto options it is even worse, because the compiler is free to reorder, remove or inline all of the code in the entire project.
I received an email from Ian Lance Taylor (author of gold, an alternative linker to ld):
Is there a way to override functions with static scope
within an object module? (I'm on x86_64 and i386 linux)
No, there isn't. In particular, the compiler may inline calls to static
functions, and it may also rewrite the functions to use a different
calling convention (GCC does both of these optimizations). So there is
no reliable way to override a static function after the code has been
compiled.
The inlining can be dealt with, I think, by -fno-inline, but changing the calling conventions is probably too much.
That being said, the DynamoRIO guys claim to be able to do it, but I haven't verified it:
https://groups.google.com/forum/?fromgroups#!topic/dynamorio-users/xt8JTXBCZ74
If you are okay with modifying machine code then you should have no problem modifying source code.
Write scripts to mechanically produce unit test sources from the real sources. Perl does this very well.

Resources