how to make shared library an executable - c

I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13

This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.

I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.

To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so

Related

How do the preprocessor's and linker's jobs not interfere? [duplicate]

Okay, until this morning I was thoroughly confused between these terms. I guess I have got the difference, hopefully.
Firstly, the confusion was that since the preprocessor already includes the header files into the code which contains the functions, what library functions does linker link to the object file produced by the assembler/compiler? Part of the confusion primarily arose due to my ignorance about the difference between a header file and a library.
After a bit of googling, and stack-overflowing (is that the term? :p), I gathered that the header file mostly contains the function declarations whereas the actual implementation is in another binary file called the library (I am still not 100% sure about this).
So, suppose in the following program:-
#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file in the code. The compiler/compiler+assembler does its work, and then finally linker combines this object file with another object file which actually has stored the way printf() works.
Am I correct in my understanding? I may be way off...so could you please help me?
Edit: I have always wondered about the C++ STL. It always confused me as to what it exactly is, a collection of all those headers or what? Now after reading the responses, can I say that STL is an object file/something that resembles an object file?
And also, I thought where I could read the function definitions of functions like pow(), sqrt() etc etc. I would open the header files and not find anything. So, is the function definition in the library in binary unreadable form?
A C source file goes through two main stages, (1) the preprocessor stage where the C source code is processed by the preprocessor utility which looks for preprocessor directives and performs those actions and (2) the compilation stage where the processed C source code is then actually compiled to produce object code files.
The preprocessor is a utility that does text manipulation. It takes as input a file that contains text (usually C source code) that may contain preprocessor directives and outputs a modified version of the file by applying any directives found to the text input to generate a text output.
The file does not have to be C source code because the preprocessor is doing text manipulation. I have seen the C Preprocssor used to extend the make utility by allowing preprossor directives to be included in a make file. The make file with the C Preprocessor directives is run through the C Preprocessor utility and the resulting output then fed into make to do the actual build of the make target.
Libraries and linking
A library is a file that contains object code of various functions. It is a way to package the output from several source files when they are compiled into a single file. Many times a library file is provided along with a header file (include file), typically with a .h file extension. The header file contains the function declarations, global variable declarations, as well as preprocessor directives needed for the library. So to use the library, you include the header file provided using the #include directive and you link with the library file.
A nice feature of a library file is that you are providing the compiled version of your source code and not the source code itself. On the other hand since the library file contains compiled source code, the compiler used to generate the library file must be compatible with the compiler being used to compile your own source code files.
There are two types of libraries commonly used. The first and older type is the static library. The second and more recent is the dynamic library (Dynamic Link Library or DLL in Windows and Shared Library or SO in Linux). The difference between the two is when the functions in the library are bound to the executable that is using the library file.
The linker is a utility that takes the various object files and library files to create the executable file. When an external or global function or variable is used the C source file, a kind of marker is used to tell the linker that the address of the function or variable needs to be inserted at that point.
The C compiler only knows what is in the source it compiles and does not know what is in other files such as object files or libraries. So the linker's job is to take the various object files and libraries and to make the final connections between parts by replacing the markers with actual connections. So a linker is a utility that "links" together the various components, replacing the marker for a global function or variable in the object files and libraries with a link to the actual object code that was generated for that global function or variable.
During the linker stage is when the difference between a static library and a dynamic or shared library becomes evident. When a static library is used, the actual object code of the library is included in the application executable. When a dynamic or shared library is used, the object code included in the application executable is code to find the shared library and connect with it when the application is run.
In some cases the same global function name may be used in several different object files or libraries so the linker will normally just use the first one it comes across and issue a warning about others found.
Summary of compile and link
So the basic process for a compile and link of a C program is:
preprocessor utility generates the C source to be compiled
compiler compiles the C source into object code generating a set of object files
linker links the various object files along with any libraries into executable file
The above is the basic process however when using dynamic libraries it can get more complicated especially if part of the application being generated has dynamic libraries that it is generating.
The loader
There is also the stage of when the application is actually loaded into memory and execution starts. An operating system provides a utility, the loader, which reads the application executable file and loads it into memory and then starts the application running. The starting point or entry point for the executable is specified in the executable file so after the loader reads the executable file into memory it will then start the executable running by jumping to the entry point memory address.
One problem the linker can run into is that sometimes it may come across a marker when it is processing the object code files that requires an actual memory address. However the linker does not know the actual memory address because the address will vary depending on where in memory the application is loaded. So the linker marks that as something for the loader utility to fix when the loader is loading the executable into memory and getting ready to start it running.
With modern CPUs with hardware supported virtual address to physical address mapping or translation, this issue of actual memory address is seldom a problem. Each application is loaded at the same virtual address and the hardware address translation deals with the actual, physical address. However older CPUs or lower cost CPUs such as micro-controllers that are lacking the memory management unit (MMU) hardware support for address translation still need this issue addressed.
Entry points and the C Runtime
A final topic is the C Runtime and the main() and the executable entry point.
The C Runtime is object code provided by the compiler manufacturer that contains the entry point for an application that is written in C. The main() function is the entry point provided by the programmer writing the application however this is not the entry point that the loader sees. The main() function is called by the C Runtime after the application is started and the C Runtime code sets up the environment for the application.
The C Runtime is not the Standard C Library. The purpose of the C Runtime is to manage the runtime environment for the application. The purpose of the Standard C Library is to provide a set of useful utility functions so that a programmer doesn't have to create their own.
When the loader loads the application and jumps to the entry point provided by the C Runtime, the C Runtime then performs the various initialization actions needed to provide the proper runtime environment for the application. Once this is done, the C Runtime then calls the main() function so that the code created by the application developer or programmer starts to run. When the main() returns or when the exit() function is called, the C Runtime performs any actions needed to clean up and close out the application.
This is an extremely common source of confusion. I think the easiest way to understand what's happening is to take a simple example. Forget about libraries for a moment and consider the following:
$ cat main.c
extern int foo( void );
int main( void ) { return foo(); }
$ cat foo.c
int foo( void ) { return 0; }
$ cc -c main.c
$ cc -c foo.c
$ cc main.o foo.o
The declaration extern int foo( void ) is performing exactly the same function as the header file of a library. foo.o is performing the function of the library. If you understand this example, and why neither cc main.c nor cc main.o work, then you understand the difference between header files and libraries.
Yes, almost correct. Except that the linker does not links object files, but also libraries - in thise case, it's the C standard library (libc) is what is linked to your object file. The rest of your assumptions appear to be true about the compilation stages + difference between a header and a library.

How can I "dump" a Function to a file?

For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?
If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.
The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!

Visibility, Fortran common variables, runtime loading of shared libraries

Environment: Intel Linux, Red Hat 5.
Compiler: gcc 3.4.6
(old stuff, legacy environment with serious infrastructure, sorry)
I have multiple versions of a particular shared library (call it something like "shared_lib.so") derived from Fortran which contains a COMMON block and various computations with references to variables in that COMMON.
I need to be able to (from C code elsewhere in the end-product executable) use dlclose() and dlopen() to switch between versions of this library (within which all versions of the COMMON contents are identical) while running. In some cases the same COMMON also appears in code which is part of a static library (call it "static_lib.a") that is also linked into the executable, and is separately maintained from my project but which has functionality which interacts with that in my shared library.
I appear to be seeing that multiple instances of the COMMON wind up in the executable, and (more importantly) that there is no linkage between the values of variables in the instance from the static library, and the values of the “same” variables in the instance from a shared library pulled in with dlopen().
What I need, in summary, is (within the overall executable) for a dlopen()-loaded shared_lib.so to be able to set/use variable XYZ in COMMON ABC, and for code in static_lib.a to set/use XYZ, and have it in effect be the same instance of XYZ, or at least for the two to be kept in synch. Is this possible?
My compilation commands for sources in shared_lib.so are of the form:
g77 –c –g –m32 -fPIC –o shared_src.o shared_src.f
My command for building shared_lib.so is of the form:
gcc -g -m32 -fPIC -shared -o shared_lib.so *.o
My command for building the executable is of the form:
gcc –g -m32 –rdynamic –o exec exec.o static_lib.a shared_lib.so –lm –ldl –lg2c
My need is to do something from the C code of the form:
handle1 = dlopen ("shared_lib.so", RTLD_NOLOAD);
dlclose (handle1);
handle2 = dlopen ("shared_lib2.so", RTLD_NOW | RTLD_GLOBAL);
...
The initial startup configuration does appear to function correctly with respect to the needed variables, but the result of subsequent dlclose() and dlopen() sequences do not. Perhaps the underlying issue is that dlopen() lacks some intelligence that gcc possesses when it is linking.
Short answer
Did/can you recompile the executable with the -fPIC? I found that it was necessary to compile both the shared library AND the executable with the -fPIC to get the COMMON blocks to be recognized properly.
Long answer
I ran into a slightly similar problem recently with COMMON blocks shared between an executable and a FORTRAN shared library. However, I'm using Intel compilers NOT the GNU compilers. The executable is mixed C/C++ and FORTRAN.
The existing (working) Windows version of the code works by sharing the common blocks between executable and DLL through DLLEXPORT/DLLIMPORT ATTRIBUTE directives. According to the Intel compiler documentation, these attribute directives are not recognized in Linux. Indeed, the Linux Intel compiler just produces warnings for these directives.
The main changes in converting the code from Windows to Linux were replacing the Windows LoadLibrary and GetProcAddress with Linux's dlopen and dlsym routines, respectively, using #ifdef sections. The shared library was compiled using -fpic and linked with -shared.
While the shared library was compiled with -fpic, the executable was NOT. When running the code compiled in this manner, variables passed to the shared library through subroutine calls were passed properly, however, the COMMON block variables were not set correctly (or were uninitialized).
In desperation, I finally tried compiling the executable itself with the -fpic compiler option, and then the COMMON blocks were recognized properly in the shared library.
This isn't really an answer, but might help you.
Here's what the 2008 standard says about COMMON:
5.7.2.4 Common association
1 Within a program, the common block storage sequences of all nonzero-sized common blocks with the same
name have the same first storage unit, and the common block storage
sequences of all zero-sized common blocks with the same name are
storage associated with one another. Within a program, the common
block storage sequences of all nonzero-sized blank common blocks have
the same first storage unit and the storage sequences of all
zero-sized blank common blocks are associated with one another and
with the first storage unit of any nonzero-sized blank common blocks.
This results in the association of objects in different scoping units.
Use or host association may cause these associated objects to be
accessible in the same scoping unit.
In short, COMMON sections with the same name in the same program occupy the same storage.
A program is defined as follows.
2.2.2 Program
1 A program shall consist of exactly one main program, any number (including zero) of other kinds of program units, any
number (including zero) of external procedures, and any number
(including zero) of other entities defined by means other than
Fortran. The main program shall be defined by a Fortran main-program
program-unit or by means other than Fortran, but not both.
The standard doesn't say anything about static vs dynamic linking and it doesn't restrict the previous statements to static linking. Therefore, it seems the dynamically loaded library should share the COMMON block with the main program (which I'm not sure is even technically possible) and thus the GNU implementation is incorrect.
On the other hand, the standard also doesn't say anything about being able to load libraries dynamically. Program units "defined by means other than Fortran" should include C libraries, but that doesn't tell us how these program units are connected to the main program. Fortran, in general, is not a very dynamic language.
Of course, you can work around all this by simply not using COMMON blocks. If a procedure needs to read/write some data, just pass it as a parameter with intent in/out. You can also group data together in a derived type and pass it around together as a unit. Nowadays (Fortran 2003+), you can even use object oriented programming, so there is really no need for global variables anymore.

How to create static linked shared libraries

For my master's thesis i'm trying to adapt a shared library approach for an ARM Cortex-M3 embedded system. As our targeted board has no MMU I think that it would make no sense to use "normal" dynamic shared libraries. Because .text is executed directly from flash and .data is copied to RAM at boot time I can't address .data relative to the code thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
From the book "Linkers and Loaders" I got aware of "static linked shared libraries, that is, libraries where program and data addresses in libraries are bound to executables at link time". The linked chapter describes how such libraries could be created in general and gives references to Unix System V, BSD/OS; but also mentions Linux and it's uselib() system call. Unfortunately the book gives no information how to actually create such libraries such as tools and/or compiler/linker switches. Apart from that book I hardly found any other information about such libraries "in the wild". The only thing I found in this regard was prelink for Linux. But as this operates on "normal" dynamic libraries thats not really what I'm searching for.
I fear that the use of these kind of libaries is very specific, so that no common tools exists to create them. Although the mentioned uselib() syscall in this context makes me wondering. But I wanted to make sure that I haven't overlooked anything before starting to hack my own linker... ;) So could anyone give me more information about such libraries?
Furthermore I'm wondering if there is any gcc/ld switch which links and relocates a file but keeps the relocation entries in the file - so that it could be re-relocated? I found the "-r" option, but that completely skips the relocation process. Does anyone have an idea?
edit:
Yes, I'm also aware of linker scripts. With gcc libfoo.c -o libfoo -nostdlib -e initLib -Ttext 0xdeadc0de I managed to get some sort of linked & relocated object file. But so far I haven't found any possibility to link a main program against this and use it as shared library. (The "normal way" of linking a dynamic shared library will be refused by the linker.)
Concepts
Minimum concept of what such a shared library maybe about.
same code
different data
There are variations on this. Do you support linking between libraries. Are the references a DAG structure or fully cyclic? Do you want to put the code in ROM, or support code updates? Do you wish to load libraries after a process is initially run? The last one is generally the difference between static shared libraries and dynamic shared libraries. Although many people will forbid references between libraries as well.
Facilities
Eventually, everything will come down to the addressing modes of the processor. In this case, the ARM thumb. The loader is generally coupled to the OS and the binary format in use. Your tool chain (compiler and linker) must also support the binary format and can generate the needed code.
Support for accessing data via a register is intrinsic in the APCS (the ARM Procedure calling standard). In this case, the data is accessed via the sb (for static base) which is register R9. The static base and stack checking are optional features. I believe you may need to configure/compile GCC to enable or disable these options.
The options -msingle-pic-base and -mpic-register are in the GCC manual. The idea is that an OS will initially allocate separate data for each library user and then load/reload the sb on a context switch. When code runs to a library, the data is accessed via the sb for that instances data.
Gcc's arm.c code has the require_pic_register() which does code generation for data references in a shared library. It may correspond to the ARM ATPCS shared library mechanics.See Sec 5.5
You may circumvent the tool chain by using macros and inline assembler and possibly function annotations, like naked and section. However, the library and possibly the process need code modification in this case; Ie, non-standard macros like EXPORT(myFunction), etc.
One possibility
If the system is fully specified (a ROM image), you can make the offsets you can pre-generate data offsets that are unique for each library in the system. This is done fairly easily with a linker script. Use the NOLOAD and put the library data in some phony section. It is even possible to make the main program a static shared library. For instance, you are making a network device with four Ethernet ports. The main application handles traffic on one port. You can spawn four instances of the application with different data to indicate which port is being handled.
If you have a large mix/match of library types, the foot print for the library data may become large. In this case you need to re-adjust the sb when calls are made through a wrapper function on the external API to the library.
void *__wrap_malloc(size_t size) /* Wrapped version. */
{
/* Locals on stack */
unsigned int new_sb = glob_libc; /* accessed via current sb. */
void * rval;
unsigned int old_sb;
volatile asm(" mov %0, sb\n" : "=r" (old_sb);
volatile asm(" mov sb, %0\n" :: "r" (new_sb);
rval = __real_malloc(size);
volatile asm(" mov sb, %0\n" :: "r" (old_sb);
return rval;
}
See the GNU ld --wrap option. This complexity is needed if you have a larger homogenous set of libraries. If your libraries consists of only 'libc/libsupc++', then you may not need to wrap anything.
The ARM ATPCS has veneers inserted by the compiler that do the equivalent,
LDR a4, [PC, #4] ; data address
MOV SB, a4
LDR a4, [PC, #4] ; function-entry
BX a4
DCD data-address
DCD function-entry
The size of the library data using this technique is 4k (possibly 8k, but that might need compiler modification). The limit is via ldr rN, [sb, #offset], were ARM limits offset to 12bits. Using the wrapping, each library has a 4k limit.
If you have multiple libraries that are not known when the original application builds, then you need to wrap each one and place a GOT type table via the OS loader at a fixed location in the main applications static base. Each application will require space for a pointer for each library. If the library is not used by the application, then the OS does not need to allocate the space and that pointer can be NULL.
The library table can be accessed via known locations in .text, via the original processes sb or via a mask of the stack. For instance, if all processes get a 2K stack, you can reserve the lower 16 words for a library table. sp & ~0x7ff will give an implicit anchor for all tasks. The OS will need to allocate task stacks as well.
Note, this mechanism is different than the ATPCS, which uses sb as a table to get offsets to the actual library data. As the memory is rather limited for the Cortex-M3 described it is unlikely that each individual library will need to use more than 4k of data. If the system supports an allocator this is a work around to this limitation.
References
Xflat technical overview - Technical discussion from the Xflat authors; Xflat is a uCLinux binary format that supports shared libraries. A very good read.
Linkage table and GOT - SO on PLT and GOT.
ARM EABI - The normal ARM binary format.
Assemblers and Loader, by David Solomon. Especially, pg262 A.3 Base Registers
ARM ATPCS, especially Section 5.5, Shared Libraries, pg18.
bFLT is another uCLinux binary format that supports shared libraries.
How much RAM do you have attached? Cortex-M systems have only a few dozen kiB on-chip and for the rest they require external SRAM.
I can't address .data relative to the code
You don't have to. You can place the library symbol jump table in the .data segment (or a segment that behaves similarly) at a fixed position.
thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
Nothing prevents you from having a second GOT placed at a fixed location, that's writable. You have to instruct your linker where and how to create it. For this you give the linker a so called "linker script", which is kind of a template-blueprint for the memory layout of the final program.
I'll try to answer your question before commenting about your intentions.
To compile a file in linux/solaris/any platform that uses ELF binaries:
gcc -o libFoo.so.1.0.0 -shared -fPIC foo1.c foo2.c foo3.c ... -Wl,-soname=libFoo.so.1
I'll explain all the options next:
-o libFoo.so.1.0.0
is the name we are going to give to the shared library file, once linked.
-shared
means that you have a shared object file at end, so there can be unsolved references after compilation and linked, that would be solved in late binding.
-fPIC
instructs the compiler to generate position independent code, so the library can be linked in a relocatable fashion.
-Wl,-soname=libFoo.so.1
has two parts: first, -Wl instructs the compiler to pass the next option (separated by comma) to the linker. The option is -soname=libFoo.so.1. This option, tells the linker the soname used for this library. The exact value of the soname is free style string, but there's a convenience custom to use the name of the library and the major version number. This is important, as when you do static linking of a shared library, the soname of the library gets stuck to the executable, so only a library with that soname can be loaded to assist this executable. Traditionally, when only the implementation of a library changes, we change only the name of the library, without changing the soname part, as the interface of the library doesn't change. But when you change the interface, you are building a new, incompatible one, so you must change the soname part, as it doesn't get in conflict with other 'versions' of it.
To link to a shared library is the same than to link to a static one (one that has .a as extension) Just put it on the command file, as in:
gcc -o bar bar.c libFoo.so.1.0.0
Normally, when you get some library in the system, you get one file and one or two symbolic links to it in /usr/lib directory:
/usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so.1 --> /usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so --> /usr/lib/libFoo.so.1
The first is the actual library called on executing your program. The second is a link with the soname as the name of the file, just to be able to do the late binding. The third is the one you must have to make
gcc -o bar bar.c -lFoo
work. (gcc and other ELF compilers search for libFoo.so, then for libFoo.a, in /usr/lib directory)
After all, there's an explanation of the concept of shared libraries, that perhaps will make you to change your image about statically linked shared code.
Dynamic libraries are a way for several programs to share the functionalities of them (that means the code, perhaps the data also). I think you are a little disoriented, as I feel you have someway misinterpreted what a statically linked shared library means.
static linking refers to the association of a program to the shared libraries it's going to use before even launching it, so there's a hardwired link between the program and all the symbols the library has. Once you launch the program, the linking process begins and you get a program running with all of its statically linked shared libraries. The references to the shared library are resolved, as the shared library is given a fixed place in the virtual memory map of the process. That's the reason the library has to be compiled with the -fPIC option (relocatable code) as it can be placed differently in the virtual space of each program.
On the opposite, dynamic linking of shared libraries refers to the use of a library (libdl.so) that allows you to load (once the program is executing) a shared library (even one that has not been known about before), search for its public symbols, solve references, load more libraries related to this one (and solve recursively as the linker could have done) and allow the program to make calls to symbols on it. The program doesn't even need to know the library was there on compiling or linking time.
Shared libraries is a concept related to the sharing of code. A long time ago, there was UNIX, and it made a great advance to share the text segment (whit the penalty of not being able for a program to modify its own code) of a program by all instances of it, so you have to wait for it to load just the first time. Nowadays, the concept of code sharing has extended to the library concept, and you can have several programs making use of the same library (perhaps libc, libdl or libm) The kernel makes a count reference of all the programs that are using it, and it just gets unloaded when no other program is using it.
using shared libraries has only one drawback: the compiler must create relocatable code to generate a shared library as the space used by one program for it can be used for another library when we try to link it to another program. This imposes normally a restriction in the set of op codes to be generated or imposes the use of one/several registers to cope with the mobility of code (there's no mobility but several linkings can make it to be situated at different places)
Believe me, using static code just derives you to making bigger executables, as you cannot share effectively the code, but with a shared library.

How do linkers decide what parts of libraries to include?

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?
Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.
It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.
It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)
Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.
This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.
Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.
It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.
It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out
It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

Resources