How does a C static library work? - c

What code goes into the final executable when using a library?
As an example, we have two files:
/*main.c*/
int main (int argc, char* argv[]){
fc(1); /*This function is defined in fc.c*/
}
Another file:
/*fc.c*/
int fc(int x){
return fe(x);
}
int fe(int y){
return y + 1;
}
We compile fc.c:
gcc -c fc.c
We then get fc.o.
Now lets build a library named test:
ar rcs libtest.a fc.o
We now have libtest.a.
Now we compile main.c
gcc -c main.c
And we obtain main.o
Let's link our main.o to our libtest.a
gcc -L. main.o -ltest
We get the desired a.out
Checking it's symbols:
nm a.out
In between all the symbols, we find:
080483cc T fc
080483df T fe
Seems good.
BUT!
If our main.c changes for this?
/*main.c*/
int main (int argc, char* argv[]){
fe(1); /*This function is defined in fc.c*/
}
After compiling main.c and linking the new main.o to our library, I will still find a symbol for fc. But I don't need that code.
Questions
-Shouldn't the library "give me" only the code I need in main.c?
-Do the functions need to be in separate modules before being added to the library?
-What if I had 300 functions? Would I need to make 300 modules?

Yes, place each function in a separate module. That way the linker will link in only the items needed.

In short, there are compiler flags to prune unused functions from the final executable code, however they are not enabled by default.
GCC can do this "garbage collection" of unused functions if these flags are added:
-ffunction-sections as a compile-time flag. It instructs the compiler to create a separate section (see object file format) for each function. There's also -fdata-sections flag with similar meaning that works for variables.
-Wl,--gc-sections as a link-time flag. The -Wl part instructs GCC to pass the following options to the linker. --gc-sections means "garbage select sections from which all code is unsed". Since due to the compile-time options each function has got a separate section, it effectively performs function-level pruning.

Related

Rename a function without changing its references

I have an object file compiled using gcc with -ffunction-sections option. I have access to the source file but iam not allowed to modify it.
file.c
void foo(void)
{
bar();
}
void bar(void)
{
abc();
}
What iam trying to achieve is to make all the references to bar take an absolute address(which I'll assign in the linker script) whereas bar will be placed at some other address by the linker.
A possible solution is to rename bar to file_bar without changing the call to bar inside foo(). I tried using objcopy -redefine-syms but it seems to rename even the calls to bar.
Solution provided by busybee solves the problem unless the functions are in the same compilation unit.
foo1.c
#include <stdio.h>
extern void bar1();
void foo1(){
printf("foo1\n");
}
int main(){
printf("main\n");
foo1();
bar1();
}
bar1.c
#include <stdio.h>
void bar1(){
printf("bar1\n");
}
wrapper.c
#include <stdio.h>
void __wrap_foo1(){
printf("wrap_foo1\n");
}
void __wrap_bar1(){
printf("wrap_bar1\n");
}
Now,
$ gcc -c -ffunction-sections foo1.c bar1.c wrapper.c
$ gcc -Wl,--wrap=foo1 -Wl,--wrap=bar1 -o output foo1.o bar1.o wrapper.o
$ ./output
main
foo1
wrap_bar1
All functions to be redirected are in their own compilation unit
The linker has the option "--wrap" that replaces all references to the symbol "xxx" by "__wrap_xxx" and the symbol itself by "__real_xxx". It is used to put a wrapper function as an "interceptor" in between call and function.
But with this option you can do whatever you like with those symbols in your linker script. You just need to define "__wrap_xxx" with a symbol so that the references are resolvable.
Depending on your needs you can also write a dummy function named "__wrap_xxx()" that does not even call "__real_xxx()". Or you can place "__real_xxx" in a vector table, or... whatever you can think of.
All functions to be redirected are non-static ("global"), patching immediate values
I looked through the answers of the other question the OP posted in a comment. This gave me the idea to weaken the symbols in question and to override them with a value by the linker.
This example might give you some insight. I tested in on Linux which has address space layout randomization so all addresses are offsets from a random base. But for the OP's target system it should work as expected.
foo1.c
Because of arbitrary values for the redirected addresses the functions can't be called. But the program can print their addresses.
#include <stdio.h>
void foo1(void) {
}
extern void bar1(void);
int main(void) {
printf("%p\n", main);
printf("%p\n", foo1);
printf("%p\n", bar1);
return 0;
}
bar1.c
void bar1(void) {
}
wrapper.ld
This is the first alternative to give the linker the addresses to be used, an additional linker script. For the second one see below. The standard linker script will be augmented here, there is no need to copy and patch it. Because of the simple structure this is probably the most simple way to provide many redirected addresses which can be easily automated.
foo1 = 0x1000;
bar1 = 0x2000;
Note: This is not C! It is "linker script" syntax which happens to be quite similar.
How I built and tested
This command sequence can be automated and sorted for your liking. Especially the calls of objcopy could be done by some loop over a list.
gcc -c -ffunction-sections foo1.c
objcopy --weaken-symbol=foo1 foo1.o foo2.o
gcc -c -ffunction-sections bar1.c
objcopy --weaken-symbol=bar1 bar1.o bar2.o
gcc foo1.o bar1.o -o original
echo original
./original
gcc foo2.o bar2.o -o weakened
echo weakened
./weakened
gcc foo2.o bar2.o wrapper.ld -o redirected
echo redirected
./redirected
Instead of an additional linker script the symbol definitions can be given on the command line, too. This is the mentioned second alternative.
gcc foo2.o bar2.o -Wl,--defsym=foo1=0x1000 -Wl,--defsym=bar1=0x2000 -o redirected
BTW, the linker understands #file to read all arguments from the file file. So there's "no limit" on the size of the linker command.
All functions to be redirected are non-static ("global"), overwriting with new functions
Instead of providing immediate values you can of course just provide your alternative functions. This works like above but instead of the additional linker script or symbol definitions you write a source file.
wrapper.c
Yes, that's right, the names are equal to the names of the originals! Because we made the symbols of the original functions weak, we'll get no error message from the linker when it overwrites the references with the addresses of the new functions.
void foo1(void) {
}
void bar1(void) {
}
Build the redirected program like this (only new commands shown):
gcc -c -ffunction-sections wrapper.c
gcc foo2.o bar2.o wrapper.o -o redirected
A function to be redirected is static
Well, depending on your target architecture it will probably not be possible. This is because of the relocation entry of the reference. It will be some kind of relative, telling the linker to resolve by an offset into the section of the function instead to resolve by the symbol of the function.
I didn't investigate this further.

GCC link against .so file without souce code

I am trying to compile compile a simple "hello world" program for an Axis A210 (cris architecture). I managed to get download GCC from the vendor, but it came with glibc, and the camera is running uClibc-0.9.27. I pulled the file /lib/libuClibc-0.9.27.so from the device.
I managed to compile this program that segfaults:
#include <unistd.h>
int main(int argc, char** argv)
{
*((unsigned int*)0) = 0xDEAD;
}
and this program that just hangs:
#include <unistd.h>
int main(int argc, char** argv)
{
int a = 0;
}
with cris-gcc -g -static -nostdlib -o compiled main.c.
Now I'd like to use the functions in libuClibc, but I can't seem to get the linking to work: I've tried
cris-gcc -g -static -nostdlib -o compiled main.c -luClibc-0.9.27 -L.
but that just gives:
./libuClibc-0.9.27.so: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
Is there a way to link to this .so file or to otherwise get some standard functions like exit working?
regarding:
cris-gcc -g -static -nostdlib -o compiled main.c -luClibc-0.9.27 -L.
The linker works with libraries in the order they are encountered. So they must be listed in the order needed.
The linker needs to know where the library is located before knowing which library to examine. Suggest:
cris-gcc -g -static -nostdlib -o compiled main.c -L. -luClibc-0.9.27
However, a *.so library is NOT a static library. It is a dynamic library, so the option: -static should be removed However, that requires that the dynamic library be available at 'run time' if the related *.a (a static library) is available then it should be used in the compile/link statement.
Note: the function: exit() has its' prototype exposed via the stdlib.h header file, not the unistd.h header file.
regarding:
#include <unistd.h>
int main(int argc, char** argv)
{
*((unsigned int*)0) = 0xDEAD;
}
the parameters: argc and argv are not used, so the compiler will output two warning statements about 'unused parameters'. Suggest using the function signature: int main( void )
this code is trying to write to address 0. However, the application does not 'own' address 0, (an usually, such an address will be 'marked' as 'readonly' so the application will exit with a 'seg fault event')
it is poor programming practice to include header files those contents are not used. Suggest removing the statement: #include <unistd.h>
this statement: int a = 0; will result in the compiler outputting a warning message about a variable that is 'set' but never 'used'
regarding:
cris-gcc -g -static -nostdlib -o compiled main.c -L. -luClibc-0.9.27
When compiling, should always enable the warnings, then fix those warnings. Suggest:
cris-gcc -Wall -Wextra -Wconversion -pedantic -std=c99 -g -static -nostdlib -o compiled main.c -luClibc-0.9.27 -L.
Apart of all the problems noticed by #user3629249 in his answer (all of them are to be followed), the message:
./libuClibc-0.9.27.so: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
means that the libuClibc-0.9.27.so binary has been stripped its symbols or you have not privileges to read the file, and so, the symbol table. The linker is unable to use that binary and it can only be loaded into memory. Anyway, you need a nonstripped shared object, and as suggested by #user3629249, don't use -static (by the reason stated in his answer), put the parameters in order (library dir before the library to be linked, also stated by him). Even you can link the shared by specifying it as:
cris-gcc -nostdlib -o compiled main.c libluClibc-0.9.27.so
and another thing: You need not only the standard C library to link an executable... you normally use a crt0.o at the beginning of your program with the C runtime and the start code for your program. You have not included that, and probably the compiler is getting it from another place.
One question: If you got the compiler, why do you intend to supply your own version of the standard library? isn't provided by the compiler? If you change the libc, then you must change also the crt0.o file. It defaults to some compiler provided, and you haven't received the message no definition for start.
Try to compile with just a main function, as you did, but don't specify shared libraries or directories... just the main code:
cris-gcc -o compiled main.c
and see what happens.... this will be very illustrative of what you lack in your system.

ld would not link the static library because of it consider the library is not needed, but I need this library [duplicate]

I have a program and a static library:
// main.cpp
int main() {}
// mylib.cpp
#include <iostream>
struct S {
S() { std::cout << "Hello World\n";}
};
S s;
I want to link the static library (libmylib.a) to the program object (main.o), although the latter does not use any symbol of the former directly.
The following commands do not seem to the job with g++ 4.7. They will run without any errors or warnings, but apparently libmylib.a will not be linked:
g++ -o program main.o -Wl,--no-as-needed /path/to/libmylib.a
or
g++ -o program main.o -L/path/to/ -Wl,--no-as-needed -lmylib
Do you have any better ideas?
Use --whole-archive linker option.
Libraries that come after it in the command line will not have unreferenced symbols discarded. You can resume normal linking behaviour by adding --no-whole-archive after these libraries.
In your example, the command will be:
g++ -o program main.o -Wl,--whole-archive /path/to/libmylib.a
In general, it will be:
g++ -o program main.o \
-Wl,--whole-archive -lmylib \
-Wl,--no-whole-archive -llib1 -llib2
The original suggestion was "close":
How to force gcc to link unreferenced, static C++ objects from a library
Try this: -Wl,--whole-archive -lyourlib
I like the other answers better, but here is another "solution".
Use the ar command to extract all the .o files from the archive.
cd mylib ; ar x /path/to/libmylib.a
Then add all those .o files to the linker command
g++ -o program main.o mylib/*.o
If there is a specific function in the static library that is stripped by the linker as unused, but you really need it (one common example is JNI_OnLoad() function), you can force the linker to keep it (and naturally, all code that is called from this function). Add -u JNI_OnLoad to your link command.

when dlopen one so, it's symbol is not covered by main symbol, why?

libp2.c
#include <stdio.h>
void pixman()
{
printf("pixman in libp1\n");
}
libc2.c
#include <stdio.h>
void pixman();
void cairo()
{
printf("cairo2\n");
pixman();
}
main.c
#include <stdio.h>
#include <dlfcn.h>
void pixman()
{
printf("pixman in main\n");
}
int main()
{
pixman();
void* handle=NULL;
void (*callfun)();
handle=dlopen("/home/zpeng/test/so_test/libc2.so",RTLD_LAZY);
callfun = (void(*)())dlsym(handle, "cairo");
callfun();
...
}
compile
gcc -c libp2.c -fPIC -olibp2.o
rm libp2.a
ar -rs libp2.a libp2.o
gcc -shared -fPIC libc2.c ./libp2.a -o libc2.so
gcc main.c -ldl -L. -g
the result:
pixman in main
cairo2
pixman: libp2
why the last is not "pixman in main"?
I see the symbols processing(LD_DEBUG=symbols), it begins with :
21180: symbol=pixman; lookup in file=./a.out
21180: symbol=pixman; lookup in file=/lib64/libdl.so.2
21180: symbol=pixman; lookup in file=/lib64/tls/libc.so.6
21180: symbol=pixman; lookup in file=/lib64/ld-linux-x86-64.so.2
21180: symbol=pixman; lookup in file=/home/zpeng/test/so_test/libc2.so
if I add -lc2 or -rdynamic to gcc main cmd , it will generate:
pixman in main
cairo2
pixman in main
My questions:
why lookup symbol in a.out but not get the result and continue to search libc2.so when without -rdynamic and -lc2 ?
Why the last is not "pixman in main"?
That's because shared libraries have their own global offset table or GOT. When you use the cairo function in libc2.so, the pixman function that will be called is the same function that was resolved when compiling the .so file in the first place.
That is:
# creates object file only -- contains first pixman implementation
gcc -c libp2.c -fPIC -olibp2.o
# just turns the object file into an archive
ar -rs libp2.a libp2.o
# creates the .so file -- all symbols in libc2.c are resolved here
# and you passed in the .a file for that purpose. The .a file containing the
# first pixman implementation gets put in libc2.so.
gcc -shared -fPIC libc2.c ./libp2.a -o libc2.so
After this, anyone using libc2.so will get the copy stored in libc2.so. The lookup order you post is for a.out I believe and it's right. It looks for pixman in a.out, then libc2.so, and so on.
Why lookup symbol in a.out but not get the result and continue to search libc2.so when without -rdynamic and -lc2?
The rdynamic option loads ALL symbols to the dynamic symbol table -- not just the ones it thinks are used (lc2 has the same effect). When you load all those symbols you have a conflict -- the pixman function. The main.c implementation is used in this case. As others have pointed out, this will probably generate a warning.
You need to compile the sources that get archived into the .a file with -fvisibility=hidden, to indicate that, although they are global functions, they are not meant to be used outside the resulting library but are instead meant to resolve symbols inside the library. That will cause the symbols in the .a file to appear with the qualifier " t " in nm -a instead of " T ", which is used for symbols available to other libraries.
It just auto binded to LOCAL symbol,
Since there not __attribute__((visibility("default"))) explicit in libp2.c, the compiler auto bind this function calling to LOCAL .symtab, instead of .dynsym
appendix1: more about ELF header: readelf -s xxx.lib
appendix2: keyword of ld argument -Bsymbolic-functions

using a method from another file without including it [duplicate]

This question already has answers here:
Why is #include <stdio.h> not required to use printf()?
(3 answers)
Closed 8 years ago.
I have two .c files which I compile over a makefile.
foo.c:
void foo()
{
printf("this is foo");
}
main.c:
#include <stdio.h>
int main()
{
printf("this is main\n");
foo();
}
the makefile looks like that:
all: main.o foo.o
gcc -o prog foo.o main.o
main.o: main.c
gcc -c main.c
foo.o: foo.c
gcc -c foo.c
So the question is:
how can foo.c use printf() without me including stdio.h AND how can main.c use the method foo() without me including foo.c.
My guess/research is that the makefile works as a linker. But I dont have prove for that and want to understand how this works excactly.
Correct me if I misunderstood something.
In the compilation phase, the compiler checks function calls against prototypes. Any function that lacks a prototype is assumed to return int and to accept any number of arguments.
If you turn up the warning level, gcc will warn you if a prototype is missing. You should add -Wall and you could also add -pedantic to get diagnostics on additional things the compiler think are suspicious.
If the compilation step succeeds, the compiler creates an object file which contains the compiled code and 2 reference tables. The first table is the export table. It contains the names of all functions and variables that are exported from the object file.
The second table is the import table. It contains a list of all functions and variables that are referenced, but where the declaration was missing.
In your case we have:
foo.o:
Export:
foo
Import:
printf
main.o
Export:
main
Import:
printf
foo
In the linker phase, the linker will take the list of imports and exports and match them. In addition to the object files and libraries you specify on the command line, the linker will automatically link with libc, which contains all functions defined by the c language.
In the makefile you can force the complier to include <stdio> or any other header:
From the docs:
-include file
Process file as if #include "file" appeared as the first line of the
primary source file. However, the first directory searched for file is
the preprocessor's working directory instead of the directory
containing the main source file. If not found there, it is searched
for in the remainder of the #include "..." search chain as normal. If
multiple -include options are given, the files are included in the
order they appear on the command line.
Just add -include filename.h in the GCC/compiler command line within the makefile.
The makefile is not a linker. It is input to make. The makefile just tells make what commands to execute under what conditions.
Your all target is running gcc in linking/linker mode gcc -o prog foo.o main.o.
The same way your foo.o and main.o targets are running gcc in compilation mode gcc -c foo.c.
For the record you can combine the two .o targets into just
%.o: %.c
gcc -c $^
which is, in fact, already a default rule in make so you need not include that rule at all.
Additionally your all target is not following bet make practices because it generates a file that does not match the name of the target. So you should use
all: prog
prog: main.o foo.o
gcc -o prog foo.o main.o
instead.
Though once again there make has you covered by default and so your entire makefile can be replaced by
all: prog
prog: main.o foo.o
and you should get the same results.

Resources