C symbol visibility in static archives - c

I have files foo.c bar.c and baz.c, plus wrapper code myfn.c defining a function myfn() that uses code and data from those other files.
I would like to create something like an object file or archive, myfn.o or libmyfn.a, so that myfn() can be made available to other projects without also exporting a load of symbols from {foo,bar,baz}.o as well.
What's the right way to do that in Linux/gcc? Thanks.
Update: I've found one way of doing it. I should've emphasised originally that this was about static archives, not DSOs. Anyway, the recipe:
#define PUBLIC __attribute__ ((visibility("default"))) then mark myfn() as PUBLIC in myfn.c. Don't mark anything else PUBLIC.
Compile objects with gcc -c foo.c bar.c baz.c myfn.c -fvisibility=hidden, which marks everything as hidden except for myfn().
Create a convenience archive using ld's partial-linking switch: ld -r foo.o bar.o baz.o myfn.o -o libmyfn.a
Localise everything that wasn't PUBLIC like so: objcopy --localize-hidden libmyfn.a
Now nm says myfn is the only global symbol in libmyfn.a and subsequent linking into other programs works just fine: gcc -o main main.c -L. -lmyfn (here, the program calls myfn(); if it tried to call foo() then compilation would fail).
If I use ar instead of ld -r in step 3 then compilation fails in step 5: I guess ar hasn't linked foo etc to myfn, and no longer can once those functions are localised, whereas ld -r resolves the link before it gets localised-away.
I'd welcome any response that confirms this is the "right" way, or describes a slicker way of achieving the same.

Unfortunately, C linkage for globals is all-or-nothing, in the sense that the globals of all modules would be available in libmyfn.a's final list of external symbols.
gcc tool chain offers an extension that lets you hide symbols from outside users, while making them available to other translation units in your library:
foo.h:
void foo();
foo.c:
void foo() __attribute__ ((visibility ("hidden")));
myfn.h:
void myfn();
myfn.c:
#include <stdio.h>
#include "foo.h"
void myfn() {
printf("calling foo...\n");
foo();
printf("calling foo again...\n");
foo();
}
For portability, you would probably benefit from making a macro for __attribute__ ((visibility ("hidden"))), and placing it in a conditional compilation block conditioned on gcc.
In addition, Linux offers a utility called strip, which lets you remove some of the symbols from compiled object files. Options -N and -K let you identify individual symbols that you want to keep or remove.

Start with this to build a static library
gcc -c -O2 foo.c bar.c baz.c myfn.c
ar av libmyfunctions.a foo.o bar.o baz.o myfn.o
Compile and link with other programs like:
gcc -O2 program.c -lmyfunctions -o myprogram
Now your libmyfunctions.a will ultimately have extra stuff from the source that isn't required by the code in myfn.c But the linker should do a reasonable job of removing this when it creates the final program.

Suppose myfn.c has function myfun() which you want to use in other three files foo.c, bar.c & baz.c
Now create a shared library from code in myfn.c viz libmyf.a
Use this function call myfun() in other three files. Declare function as extern in these files. Now you can create object code of these thee files and link the libmyf.a at linking phase.
Refer to following link for using shared libraries.
http://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html

Related

Function with same names in different files - C

I have two functions with same name and want to use it in my application.
I referred various answers like here and here but couldn't get clear solution.
I have following functions
// xxxx_input.h
int8_t input_system_init(InputParams params);
int8_t input_system_easy_load(uint32_t interval_ms);
// yyyy_input.h
int8_t input_system_init(InputParams params);
int8_t input_system_easy_load(uint32_t interval_ms);
Reason there are two files is xxxx_input and yyyy_input work way different internally.
Modifying the function isn't easy since the code is provided by external party and we have to keep the xxxx_input files.
What we can do is modify yyyy_input.h but functions like input_system_easy_load are to be kept consistent as they are called from different places.
Is there a way we can achieve the same?
I tried replacing xxxx_input with yyyy_input.h but since the include directory already contains the same function it gives error.
input_system_init multiply defined (by xxxx_input.o and yyyy_input.o).
If you have the source code to the functions defined in xxxx_input.h and yyyy_input.h, you could compile both modules with command line options redefining the function names via the preprocessor:
gcc -Dinput_system_init=xxxx_input_system_init -Dinput_system_easy_load=xxxx_input_system_easy_load xxxx_input.c
gcc -Dinput_system_init=yyyy_input_system_init -Dinput_system_easy_load=yyyy_input_system_easy_load yyyy_input.c
You would then compile your code with modified prototypes and you could link all 3 modules together.
If the modules are provided in object form only, you could define wrapper functions xxxx_input_system_init and xxxx_input_system_easy_load that you would link with the xxxx_input.o to produce a dynamic library, and the same for the yyyy alternatives. You would use modified prototypes in your module and would link it with the dynamic libraries.
Mike Kinghan showed a simpler approach for object files and libraries on systems where objcopy is available.
To get modified prototypes automatically, you can use this include file:
my_input_system.h:
#define input_system_init xxxx_input_system_init
#define input_system_easy_load xxxx_input_system_easy_load
#include "xxxx_input.h"
#undef input_system_init
#undef input_system_easy_load
#define input_system_init yyyy_input_system_init
#define input_system_easy_load yyyy_input_system_easy_load
#include "yyyy_input.h"
#undef input_system_init
#undef input_system_easy_load
/* prevent direct use of the redefined functions */
#define input_system_init do_not_use_input_system_init#
#define input_system_easy_load do_not_use_input_system_easy_load#
I'll walk through a solution you can use if your supplier has given you object files compiled
with GCC or Clang for GNU/Linux computers.
My supplier has given me a header foo_a.h file that declares function foo
$cat foo_a.h
#pragma once
extern void foo(void);
and the matching object file foo.o:
$ nm foo_a.o
0000000000000000 T foo
U _GLOBAL_OFFSET_TABLE_
U puts
that defines foo.
Likewise they've given me a header foo_b.h that also declares foo
$ cat foo_b.h
#pragma once
extern void foo(void);
and the matching object file foo_b.o
$ nm foo_b.o
0000000000000000 T foo
U _GLOBAL_OFFSET_TABLE_
U puts
that also defines foo.
The functions foo_a.o:foo and foo_b.o:foo do different
things (or different variations of the same thing). I want to do both these things in the same program,
prog.c:
$ cat prog.c
extern void foo_a(void);
extern void foo_b(void);
int main(void)
{
foo_a(); // Calls `foo_a.o:foo`
foo_b(); // Calls `foo_b.o:foo`
return 0;
}
I can make such a program as follows:
$ objcopy --redefine-sym foo=foo_a foo_a.o prog_foo_a.o
$ objcopy --redefine-sym foo=foo_b foo_b.o prog_foo_b.o
Now I have made a copy prog_foo_a.o of foo_a.o in which the symbol foo is
renamed foo_a, and a copy prog_foo_b.o of foo_b.o in which
the symbol foo is renamed foo_b.
Then I compile and link like this:
$ gcc -c -Wall -Wextra prog.c
$ gcc -o prog prog.o prog_foo_a.o prog_foo_b.o
And prog runs like:
$ ./prog
foo_a
foo_b
Perhaps my supplier has given me foo_a.o within a static library liba.a
that also contains other object files that refer to foo_a.o:foo? And similarly
with foo_b.o.
That's OK. Instead of:
$ objcopy --redefine-sym foo=foo_a foo_a.o prog_foo_a.o
$ objcopy --redefine-sym foo=foo_b foo_b.o prog_foo_b.o
I will run:
$ objcopy --redefine-sym foo=foo_a liba.a libprog_a.a
$ objcopy --redefine-sym foo=foo_b libb.a libprog_b.a
and this will give me a new static library libprog_a.a
in which foo is renamed foo_a in all the object files in the
library. Similarly foo is renamed foo_b throughout libprog_b.a.
Then I'll link prog:
$ gcc -o prog prog.o -L. -lprog_a -lprog_b
Consider a potential drawback with this solution. Possibly my supplier has
given me foo_a.o and foo_b.o with debugging information in them, and I want to
used it for debugging my prog with gdb?
I have changed the original symbol names, foo_a.o:foo to foo_a and
foo_b.o:foo to foo_b, but I haven't changed the debugging info associated
with those symbols. Debugging with gdb will still work, but some of the
debugging output will be incorrect and possibly confusing. E.g. if I put
a breakpoint on foo_a, gdb will run to it and stop, but it will say
it has stopped at foo from file foo_a.c. And if I then breakpoint at foo_b, gdb will run to
it and again say it is at foo, but from file foo_b.c. If the person doing
the debugging doesn't know how the program was built, this would certainly be
confusing.
But giving you debugging info with binaries is not far from giving you the
source code, so as you haven't got source code you likely don't have
debugging info and are not concerned about it.

Isn't ld checking for unresolved symbols in shared libraries redundant?

When linking a program against a shared object, ld will ensure that symbols can be resolved. This basically ensures that the interfaces between the program and its shared objects are compatible. After reading Linking with dynamic library with dependencies, I learnt that ld will descend into linked shared objects and attempt to resolve their symbols too.
Aren't my shared object's references already checked when the shared objects are themselves linked?
I can understand the appeal of finding out at link time whether a program has all the pieces it requires to start, but does it seems irrelevant in the context of packages building where shared objects may be distributed separately (Debian's lib* packages, for instance). It introduces recursive build dependencies on systems uninterested in executing built programs.
Can I trust the dependencies resolved when the shared object was built? If so, how safe is it to use -unresolved-symbols=ignore-in-shared-libs when building my program?
You're wondering why a program's linkage should bother to resolve symbols originating in
the shared libraries that it's linked with because:
Aren't my shared object's references already checked when the shared objects are themselves linked?
No they're not, unless you expressly insist on it when you link the shared library,
Here I'm going to build a shared library libfoo.so:
foo.c
extern void bar();
void foo(void)
{
bar();
}
Routinely compile and link:
$ gcc -fPIC -c foo.c
$ gcc -shared -o libfoo.so foo.o
No problem, and bar is undefined:
$ nm --undefined-only libfoo.so | grep bar
U bar
I need to insist to get the linker to object to that:
$ gcc --shared -o libfoo.so foo.o -Wl,--no-undefined
foo.o: In function `foo':
foo.c:(.text+0xa): undefined reference to `bar'
Of course:
main.c
extern void foo(void);
int main(void)
{
foo();
return 0;
}
it won't let me link libfoo with a program:
$ gcc -c main.c
$ gcc -o prog main.o -L. -lfoo
./libfoo.so: undefined reference to `bar'
unless I also resolve bar in the same linkage:
bar.c
#include <stdio.h>
void bar(void)
{
puts("Hello world!");
}
maybe by getting it from another shared library:
gcc -fPIC -c bar.c
$ gcc -shared -o libbar.so bar.o
$ gcc -o prog main.o -L. -lfoo -lbar
And then everything's fine.
$ export LD_LIBRARY_PATH=.; ./prog
Hello world!
It's of the essense of a shared library that it doesn't by default have
to have all of its symbols resolved at linktime. That way that a program - which
typically does need all its symbols resolved a linktime - can get all its symbols
resolved by being linked with more than one library.
Aren't my shared object's references already checked
when the shared objects are themselves linked?
Well, shared libs might have been linked with -Wl,--allow-shlib-undefined or with dummy dependencies so it still makes sense to check them.
Can I trust the dependencies resolved when the shared object was built?
Probly not, current linking environment and the environment used to link original shlibs may be different.
If so, how safe is it to use -unresolved-symbols=ignore-in-shared-libs
when building my program?
You may be missing potential errors in this case (or rather delaying them to runtime which is still bad). Imagine a situation where some of the symbols needed by shared objects are to come from executable itself or from one of the libs which is linked by executable (but not by the shlib which is missing the symbols).
EDIT
Although above is correct, Mike Kinghan's answer gives stronger argument in favor of symbol resolution in libraries during executable link.

when dlopen one so, it's symbol is not covered by main symbol, why?

libp2.c
#include <stdio.h>
void pixman()
{
printf("pixman in libp1\n");
}
libc2.c
#include <stdio.h>
void pixman();
void cairo()
{
printf("cairo2\n");
pixman();
}
main.c
#include <stdio.h>
#include <dlfcn.h>
void pixman()
{
printf("pixman in main\n");
}
int main()
{
pixman();
void* handle=NULL;
void (*callfun)();
handle=dlopen("/home/zpeng/test/so_test/libc2.so",RTLD_LAZY);
callfun = (void(*)())dlsym(handle, "cairo");
callfun();
...
}
compile
gcc -c libp2.c -fPIC -olibp2.o
rm libp2.a
ar -rs libp2.a libp2.o
gcc -shared -fPIC libc2.c ./libp2.a -o libc2.so
gcc main.c -ldl -L. -g
the result:
pixman in main
cairo2
pixman: libp2
why the last is not "pixman in main"?
I see the symbols processing(LD_DEBUG=symbols), it begins with :
21180: symbol=pixman; lookup in file=./a.out
21180: symbol=pixman; lookup in file=/lib64/libdl.so.2
21180: symbol=pixman; lookup in file=/lib64/tls/libc.so.6
21180: symbol=pixman; lookup in file=/lib64/ld-linux-x86-64.so.2
21180: symbol=pixman; lookup in file=/home/zpeng/test/so_test/libc2.so
if I add -lc2 or -rdynamic to gcc main cmd , it will generate:
pixman in main
cairo2
pixman in main
My questions:
why lookup symbol in a.out but not get the result and continue to search libc2.so when without -rdynamic and -lc2 ?
Why the last is not "pixman in main"?
That's because shared libraries have their own global offset table or GOT. When you use the cairo function in libc2.so, the pixman function that will be called is the same function that was resolved when compiling the .so file in the first place.
That is:
# creates object file only -- contains first pixman implementation
gcc -c libp2.c -fPIC -olibp2.o
# just turns the object file into an archive
ar -rs libp2.a libp2.o
# creates the .so file -- all symbols in libc2.c are resolved here
# and you passed in the .a file for that purpose. The .a file containing the
# first pixman implementation gets put in libc2.so.
gcc -shared -fPIC libc2.c ./libp2.a -o libc2.so
After this, anyone using libc2.so will get the copy stored in libc2.so. The lookup order you post is for a.out I believe and it's right. It looks for pixman in a.out, then libc2.so, and so on.
Why lookup symbol in a.out but not get the result and continue to search libc2.so when without -rdynamic and -lc2?
The rdynamic option loads ALL symbols to the dynamic symbol table -- not just the ones it thinks are used (lc2 has the same effect). When you load all those symbols you have a conflict -- the pixman function. The main.c implementation is used in this case. As others have pointed out, this will probably generate a warning.
You need to compile the sources that get archived into the .a file with -fvisibility=hidden, to indicate that, although they are global functions, they are not meant to be used outside the resulting library but are instead meant to resolve symbols inside the library. That will cause the symbols in the .a file to appear with the qualifier " t " in nm -a instead of " T ", which is used for symbols available to other libraries.
It just auto binded to LOCAL symbol,
Since there not __attribute__((visibility("default"))) explicit in libp2.c, the compiler auto bind this function calling to LOCAL .symtab, instead of .dynsym
appendix1: more about ELF header: readelf -s xxx.lib
appendix2: keyword of ld argument -Bsymbolic-functions

Static library cannot be found

Let me explain the context first. I have a header with a function declaration, a .c program with the body of the function, and the main program.
foo.h
#ifndef _FOO_H_
#define _FOO_H_
void foo();
#endif
foo.c
#include<stdio.h>
#include "include/foo.h"
void foo()
{
printf("Hello\n");
}
mainer.c
#include <stdio.h>
#include "include/foo.h"
int main()
{ foo();
return 0;
}
For the purpose of this program, both the header and the static library need to be in separate folders, so the header is on /include/foo.h and the static library generated will be on /lib/libfoo.a, and both .c programs on the main directory. The idea is to generate the object program, then the static library, then linking the static library to create the executable, and finally executing the program.
I have no problem in both creating the object program and the static library.
$ gcc -c foo.c -o foo.o
$ ar rcs lib/libfoo.a foo.o
But when I try to link the static library...
$ gcc -static mainer.c -L. -lfoo -o mainfoo
It gaves to me an error, claiming the static library can't be found
/usr/bin/ld: cannot find -lfoo
collect2: ld returned 1 exit status
It's strange, considering I asked before how to work with static libraries and headers on separate folders and in this case the static libraries were found. Any idea what I'm doing wrong?
Change -L. to -Llib as it looks like you create the .a file there.
Basically the linker is telling you that it cannot find the library foo. It normally searches in the default library directories + any you give it with the -L option. You're telling it to look in the current directory, but not in lib where libfoo.a is located, which is why it can't find it. You need to change -L. to -Llib.
I am not completely sure that I understand your directory structure, but maybe what you need is this:
gcc -static mainer.c -L./lib -lfoo -o mainfoo

shared library constructor not working

In my shared library I have to do certain initialization at the load time. If I define the function with the GCC attribute __attribute__ ((constructor)) it doesn't work, i.e. it doesn't get called when the program linking my shared library is loaded.
If I change the function name to _init(), it works. Apparently the usage of _init() and _fini() functions are not recommended now.
Any idea why __attribute__ ((constructor)) wouldn't work? This is with Linux 2.6.9, gcc version 3.4.6
Edit:
For example, let's say the library code is this the following:
#include <stdio.h>
int smlib_count;
void __attribute__ ((constructor)) setup(void) {
smlib_count = 100;
printf("smlib_count starting at %d\n", smlib_count);
}
void smlib_count_incr() {
smlib_count++;
smlib_count++;
}
int smlib_count_get() {
return smlib_count;
}
For building the .so I do the following:
gcc -fPIC -c smlib.c
ld -shared -soname libsmlib.so.1 -o libsmlib.so.1.0 -lc smlib.o
ldconfig -v -n .
ln -sf libsmlib.so.1 libsmlib.so
Since the .so is not in one of the standard locations I update the LD_LIBRARY_PATH and link the .so from a another program. The constructor doesn't get called. If I change it to _init(), it works.
Okay, so I've taken a look at this, and it looks like what's happening is that your intermediate gcc step (using -c) is causing the issue. Here's my interpretation of what I'm seeing.
When you compile as a .o with setup(), gcc just treats it as a normal function (since you're not compiling as a .so, so it doesn't care). Then, ld doesn't see any _init() or anything like a DT_INIT in the ELF's dynamic section, and assumes there's no constructors.
When you compile as a .o with _init(), gcc also treats it as a normal function. In fact, it looks to me like the object files are identical except for the names of the functions themselves! So once again, ld looks at the .o file, but this time sees a _init() function, which it knows it's looking for, and decides it's a constructor, and correspondingly creates a DT_INIT entry in the new .so.
Finally, if you do the compilation and linking in one step, like this:
gcc -Wall -shared -fPIC -o libsmlib.so smlib.c
Then what happens is that gcc sees and understands the __attribute__ ((constructor)) in the context of creating a shared object, and creates a DT_INIT entry accordingly.
Short version: use gcc to compile and link in one step. You can use -Wl (see the man page) for passing in extra options like -soname if required, like -Wl,-soname,libsmlib.so.1.
From this link :
"Shared libraries must not be compiled with the gcc arguments -nostartfiles'' or-nostdlib''. If those arguments are used, the constructor/destructor routines will not be executed (unless special measures are taken)."
gcc/ld doesn't set the DT_INIT bit in the elf header when -nostdlib is used . You can check objdump -p and look for the section INIT in both cases. In attribute ((constructor)) case you wont find that INIT section . But for __init case you will find INIT section in the shared library.

Resources