Visibility, Fortran common variables, runtime loading of shared libraries - c

Environment: Intel Linux, Red Hat 5.
Compiler: gcc 3.4.6
(old stuff, legacy environment with serious infrastructure, sorry)
I have multiple versions of a particular shared library (call it something like "shared_lib.so") derived from Fortran which contains a COMMON block and various computations with references to variables in that COMMON.
I need to be able to (from C code elsewhere in the end-product executable) use dlclose() and dlopen() to switch between versions of this library (within which all versions of the COMMON contents are identical) while running. In some cases the same COMMON also appears in code which is part of a static library (call it "static_lib.a") that is also linked into the executable, and is separately maintained from my project but which has functionality which interacts with that in my shared library.
I appear to be seeing that multiple instances of the COMMON wind up in the executable, and (more importantly) that there is no linkage between the values of variables in the instance from the static library, and the values of the “same” variables in the instance from a shared library pulled in with dlopen().
What I need, in summary, is (within the overall executable) for a dlopen()-loaded shared_lib.so to be able to set/use variable XYZ in COMMON ABC, and for code in static_lib.a to set/use XYZ, and have it in effect be the same instance of XYZ, or at least for the two to be kept in synch. Is this possible?
My compilation commands for sources in shared_lib.so are of the form:
g77 –c –g –m32 -fPIC –o shared_src.o shared_src.f
My command for building shared_lib.so is of the form:
gcc -g -m32 -fPIC -shared -o shared_lib.so *.o
My command for building the executable is of the form:
gcc –g -m32 –rdynamic –o exec exec.o static_lib.a shared_lib.so –lm –ldl –lg2c
My need is to do something from the C code of the form:
handle1 = dlopen ("shared_lib.so", RTLD_NOLOAD);
dlclose (handle1);
handle2 = dlopen ("shared_lib2.so", RTLD_NOW | RTLD_GLOBAL);
...
The initial startup configuration does appear to function correctly with respect to the needed variables, but the result of subsequent dlclose() and dlopen() sequences do not. Perhaps the underlying issue is that dlopen() lacks some intelligence that gcc possesses when it is linking.

Short answer
Did/can you recompile the executable with the -fPIC? I found that it was necessary to compile both the shared library AND the executable with the -fPIC to get the COMMON blocks to be recognized properly.
Long answer
I ran into a slightly similar problem recently with COMMON blocks shared between an executable and a FORTRAN shared library. However, I'm using Intel compilers NOT the GNU compilers. The executable is mixed C/C++ and FORTRAN.
The existing (working) Windows version of the code works by sharing the common blocks between executable and DLL through DLLEXPORT/DLLIMPORT ATTRIBUTE directives. According to the Intel compiler documentation, these attribute directives are not recognized in Linux. Indeed, the Linux Intel compiler just produces warnings for these directives.
The main changes in converting the code from Windows to Linux were replacing the Windows LoadLibrary and GetProcAddress with Linux's dlopen and dlsym routines, respectively, using #ifdef sections. The shared library was compiled using -fpic and linked with -shared.
While the shared library was compiled with -fpic, the executable was NOT. When running the code compiled in this manner, variables passed to the shared library through subroutine calls were passed properly, however, the COMMON block variables were not set correctly (or were uninitialized).
In desperation, I finally tried compiling the executable itself with the -fpic compiler option, and then the COMMON blocks were recognized properly in the shared library.

This isn't really an answer, but might help you.
Here's what the 2008 standard says about COMMON:
5.7.2.4 Common association
1 Within a program, the common block storage sequences of all nonzero-sized common blocks with the same
name have the same first storage unit, and the common block storage
sequences of all zero-sized common blocks with the same name are
storage associated with one another. Within a program, the common
block storage sequences of all nonzero-sized blank common blocks have
the same first storage unit and the storage sequences of all
zero-sized blank common blocks are associated with one another and
with the first storage unit of any nonzero-sized blank common blocks.
This results in the association of objects in different scoping units.
Use or host association may cause these associated objects to be
accessible in the same scoping unit.
In short, COMMON sections with the same name in the same program occupy the same storage.
A program is defined as follows.
2.2.2 Program
1 A program shall consist of exactly one main program, any number (including zero) of other kinds of program units, any
number (including zero) of external procedures, and any number
(including zero) of other entities defined by means other than
Fortran. The main program shall be defined by a Fortran main-program
program-unit or by means other than Fortran, but not both.
The standard doesn't say anything about static vs dynamic linking and it doesn't restrict the previous statements to static linking. Therefore, it seems the dynamically loaded library should share the COMMON block with the main program (which I'm not sure is even technically possible) and thus the GNU implementation is incorrect.
On the other hand, the standard also doesn't say anything about being able to load libraries dynamically. Program units "defined by means other than Fortran" should include C libraries, but that doesn't tell us how these program units are connected to the main program. Fortran, in general, is not a very dynamic language.
Of course, you can work around all this by simply not using COMMON blocks. If a procedure needs to read/write some data, just pass it as a parameter with intent in/out. You can also group data together in a derived type and pass it around together as a unit. Nowadays (Fortran 2003+), you can even use object oriented programming, so there is really no need for global variables anymore.

Related

Does CMake compile everything in the included headers into the executable or only the parts used in the main class?

I'm writing a C program where every bit of the executable size matters.
If, for example, only printf() from stdlib.h is required in my program, would including the header actually cause everything in that library to be copied into the CMake compiled executable?
CMake is just the build system generator. What ultimately goes into the final executable is decided by the linker and which options you use with it. Typical linkers will only link into the executable what they can determine to be necessary – unless you ask them to link everything. However there's some limits on how much they can reduce the footprint.
The rule of thumb is, that if you use a function found in foo.o, then the whole lot of foo.o gets linked; hence if size optimization is your goal, it's a good idea to give each function its own compilation unit.
What headers you use has no effect whatsoever, because headers are processed at compilation time, not linkage time.
Last but not least: In most implementation of the standard library, the printf family of functions is among the most heavyweight ones, so don't use them if you're beancounting.
As a principle, headers should be idempotent, that is, they should not affect the executable if the declarations are not used. stdlib.h should only have things like prototypes, pre-processor macro definitions and struct definitions, it should not contain executable code or variable declarations.
Standard library code is included by the linker as required. However, the C runtime-library library (RTL) might have this code in a DLL or shared object, depending on your platform. Using a DLL (or equivalent) does not affect the size of the executable file, but of course can affect the memory used. Since DLL code is shared between processes it is not uncommon for the C RTL to remain in memory, but, assuming dynamic linking, there will only be one copy, regardless of the number of C processes running. Most C RTLs will have some memory allocated per-process, but how much depends on the compiler/platform.

Dynamically change the running code by writing into the __FILE__?

I got to know of a way to print the source code of a running code in C using the __FILE__ macro. As such I can seek the location and use putchar() to alter the contents of the file.
Is it possible to dynamically change the running code using this method?
Is it possible to dynamically change the running code using this method ?
No, because once a program is compiled it no longer depends on the source file.
If you want learn how to alter the behavior of an process that is already running from within the process itself, you need to learn about assembly for the architecture you're using, the executable file format on your system, and the process API on your system, at the very least.
As most other answers are explaining, in practical terms, most C implementations are compilers. So the executable that is running has only an indirect (and delayed) relation with the source code, because the source code had to be processed by the compiler to produce that executable.
Remember that a programming language is (not a software but...) a specification, written in some report. Read n1570, draft specification of C11. Most implementations of C are command-line compilers (e.g. GCC & Clang/LLVM in the free software realm), even if you might find interpreters.
However, with some operating systems (notably POSIX ones, such as MacOSX and Linux), you could dynamically load some plugin. Or you could create, in some other way (such as JIT compilation libraries like libgccjit or LLVM or libjit or GNU lightning), a fresh function and dynamically get a pointer to it (and that is not stricto sensu conforming to the C standard, where a function pointer should point to some existing function of your program).
On Linux, you might generate (at runtime of your own program, linked with -rdynamic to have its names usable from plugins, and with -ldl library to get the dynamic loader) some C code in some temporary source file e.g. /tmp/gencode.c, run a compilation (using e.g. system(3) or popen(3)) of that emitted code as a /tmp/gencode.so plugin thru a command like e.g. gcc -O1 -g -Wall -fPIC -shared /tmp/gencode.c -o /tmp/gencode.so, then dynamically load that plugin using dlopen(3), find function pointers (from some conventional name) in that loaded plugin with dlsym(3), and call indirectly that function pointer. My manydl.c program shows that is possible for many hundred thousands of generated C files and loaded plugins. I'm using similar tricks in my GCC MELT. See also this and that. Notice that you don't really "self-modify" C code, you more broadly generate additional C code, compile it (as some plugin, etc...), and then load it -as an extension or plugin- then use it.
(for pragmatical reasons including ease of debugging, I don't recommend overwriting some existing C file, but just emitting new C code in some fresh temporary .c file -from some internal AST-like representation- that you would later feed to the compiler)
Is it possible to dynamically change the running code?
In general (at least on Linux and most POSIX systems), the machine code sits in a read-only code segment of the virtual address space so you cannot change or overwrite it; but you can use indirection thru function pointers (in your C code) to call newly loaded code (e.g. from dlopen-ed plugins).
However, you might also read about homoiconic languages, metaprogramming, multi-staged programming, and try to use Common Lisp (e.g. using its SBCL implementation, which compile to machine code at every REPL interaction and at every eval). I also recommend reading SICP (an excellent and freely available introduction to programming, with some chapters related to metaprogramming approaches)
PS. Dynamic loading of plugins is also possible in Windows -which I don't know- with LoadLibrary, but with a very different (and incompatible) model. Read Levine's linkers and loaders.
A computer doesn't understand the code as we do. It compiles or interprets it and loads into memory. Our modification of code is just changing the file. One needs to compile it and link it with other libraries and load it into memory.
ptrace() is a syscall used to inject code into a running program. You can probably look into that and achieve whatever you are trying to do.
Inject hello world in a running program. I have tried and tested this sometime before.

how to make shared library an executable

I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13
This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.
I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.
To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so

How to create static linked shared libraries

For my master's thesis i'm trying to adapt a shared library approach for an ARM Cortex-M3 embedded system. As our targeted board has no MMU I think that it would make no sense to use "normal" dynamic shared libraries. Because .text is executed directly from flash and .data is copied to RAM at boot time I can't address .data relative to the code thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
From the book "Linkers and Loaders" I got aware of "static linked shared libraries, that is, libraries where program and data addresses in libraries are bound to executables at link time". The linked chapter describes how such libraries could be created in general and gives references to Unix System V, BSD/OS; but also mentions Linux and it's uselib() system call. Unfortunately the book gives no information how to actually create such libraries such as tools and/or compiler/linker switches. Apart from that book I hardly found any other information about such libraries "in the wild". The only thing I found in this regard was prelink for Linux. But as this operates on "normal" dynamic libraries thats not really what I'm searching for.
I fear that the use of these kind of libaries is very specific, so that no common tools exists to create them. Although the mentioned uselib() syscall in this context makes me wondering. But I wanted to make sure that I haven't overlooked anything before starting to hack my own linker... ;) So could anyone give me more information about such libraries?
Furthermore I'm wondering if there is any gcc/ld switch which links and relocates a file but keeps the relocation entries in the file - so that it could be re-relocated? I found the "-r" option, but that completely skips the relocation process. Does anyone have an idea?
edit:
Yes, I'm also aware of linker scripts. With gcc libfoo.c -o libfoo -nostdlib -e initLib -Ttext 0xdeadc0de I managed to get some sort of linked & relocated object file. But so far I haven't found any possibility to link a main program against this and use it as shared library. (The "normal way" of linking a dynamic shared library will be refused by the linker.)
Concepts
Minimum concept of what such a shared library maybe about.
same code
different data
There are variations on this. Do you support linking between libraries. Are the references a DAG structure or fully cyclic? Do you want to put the code in ROM, or support code updates? Do you wish to load libraries after a process is initially run? The last one is generally the difference between static shared libraries and dynamic shared libraries. Although many people will forbid references between libraries as well.
Facilities
Eventually, everything will come down to the addressing modes of the processor. In this case, the ARM thumb. The loader is generally coupled to the OS and the binary format in use. Your tool chain (compiler and linker) must also support the binary format and can generate the needed code.
Support for accessing data via a register is intrinsic in the APCS (the ARM Procedure calling standard). In this case, the data is accessed via the sb (for static base) which is register R9. The static base and stack checking are optional features. I believe you may need to configure/compile GCC to enable or disable these options.
The options -msingle-pic-base and -mpic-register are in the GCC manual. The idea is that an OS will initially allocate separate data for each library user and then load/reload the sb on a context switch. When code runs to a library, the data is accessed via the sb for that instances data.
Gcc's arm.c code has the require_pic_register() which does code generation for data references in a shared library. It may correspond to the ARM ATPCS shared library mechanics.See Sec 5.5
You may circumvent the tool chain by using macros and inline assembler and possibly function annotations, like naked and section. However, the library and possibly the process need code modification in this case; Ie, non-standard macros like EXPORT(myFunction), etc.
One possibility
If the system is fully specified (a ROM image), you can make the offsets you can pre-generate data offsets that are unique for each library in the system. This is done fairly easily with a linker script. Use the NOLOAD and put the library data in some phony section. It is even possible to make the main program a static shared library. For instance, you are making a network device with four Ethernet ports. The main application handles traffic on one port. You can spawn four instances of the application with different data to indicate which port is being handled.
If you have a large mix/match of library types, the foot print for the library data may become large. In this case you need to re-adjust the sb when calls are made through a wrapper function on the external API to the library.
void *__wrap_malloc(size_t size) /* Wrapped version. */
{
/* Locals on stack */
unsigned int new_sb = glob_libc; /* accessed via current sb. */
void * rval;
unsigned int old_sb;
volatile asm(" mov %0, sb\n" : "=r" (old_sb);
volatile asm(" mov sb, %0\n" :: "r" (new_sb);
rval = __real_malloc(size);
volatile asm(" mov sb, %0\n" :: "r" (old_sb);
return rval;
}
See the GNU ld --wrap option. This complexity is needed if you have a larger homogenous set of libraries. If your libraries consists of only 'libc/libsupc++', then you may not need to wrap anything.
The ARM ATPCS has veneers inserted by the compiler that do the equivalent,
LDR a4, [PC, #4] ; data address
MOV SB, a4
LDR a4, [PC, #4] ; function-entry
BX a4
DCD data-address
DCD function-entry
The size of the library data using this technique is 4k (possibly 8k, but that might need compiler modification). The limit is via ldr rN, [sb, #offset], were ARM limits offset to 12bits. Using the wrapping, each library has a 4k limit.
If you have multiple libraries that are not known when the original application builds, then you need to wrap each one and place a GOT type table via the OS loader at a fixed location in the main applications static base. Each application will require space for a pointer for each library. If the library is not used by the application, then the OS does not need to allocate the space and that pointer can be NULL.
The library table can be accessed via known locations in .text, via the original processes sb or via a mask of the stack. For instance, if all processes get a 2K stack, you can reserve the lower 16 words for a library table. sp & ~0x7ff will give an implicit anchor for all tasks. The OS will need to allocate task stacks as well.
Note, this mechanism is different than the ATPCS, which uses sb as a table to get offsets to the actual library data. As the memory is rather limited for the Cortex-M3 described it is unlikely that each individual library will need to use more than 4k of data. If the system supports an allocator this is a work around to this limitation.
References
Xflat technical overview - Technical discussion from the Xflat authors; Xflat is a uCLinux binary format that supports shared libraries. A very good read.
Linkage table and GOT - SO on PLT and GOT.
ARM EABI - The normal ARM binary format.
Assemblers and Loader, by David Solomon. Especially, pg262 A.3 Base Registers
ARM ATPCS, especially Section 5.5, Shared Libraries, pg18.
bFLT is another uCLinux binary format that supports shared libraries.
How much RAM do you have attached? Cortex-M systems have only a few dozen kiB on-chip and for the rest they require external SRAM.
I can't address .data relative to the code
You don't have to. You can place the library symbol jump table in the .data segment (or a segment that behaves similarly) at a fixed position.
thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
Nothing prevents you from having a second GOT placed at a fixed location, that's writable. You have to instruct your linker where and how to create it. For this you give the linker a so called "linker script", which is kind of a template-blueprint for the memory layout of the final program.
I'll try to answer your question before commenting about your intentions.
To compile a file in linux/solaris/any platform that uses ELF binaries:
gcc -o libFoo.so.1.0.0 -shared -fPIC foo1.c foo2.c foo3.c ... -Wl,-soname=libFoo.so.1
I'll explain all the options next:
-o libFoo.so.1.0.0
is the name we are going to give to the shared library file, once linked.
-shared
means that you have a shared object file at end, so there can be unsolved references after compilation and linked, that would be solved in late binding.
-fPIC
instructs the compiler to generate position independent code, so the library can be linked in a relocatable fashion.
-Wl,-soname=libFoo.so.1
has two parts: first, -Wl instructs the compiler to pass the next option (separated by comma) to the linker. The option is -soname=libFoo.so.1. This option, tells the linker the soname used for this library. The exact value of the soname is free style string, but there's a convenience custom to use the name of the library and the major version number. This is important, as when you do static linking of a shared library, the soname of the library gets stuck to the executable, so only a library with that soname can be loaded to assist this executable. Traditionally, when only the implementation of a library changes, we change only the name of the library, without changing the soname part, as the interface of the library doesn't change. But when you change the interface, you are building a new, incompatible one, so you must change the soname part, as it doesn't get in conflict with other 'versions' of it.
To link to a shared library is the same than to link to a static one (one that has .a as extension) Just put it on the command file, as in:
gcc -o bar bar.c libFoo.so.1.0.0
Normally, when you get some library in the system, you get one file and one or two symbolic links to it in /usr/lib directory:
/usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so.1 --> /usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so --> /usr/lib/libFoo.so.1
The first is the actual library called on executing your program. The second is a link with the soname as the name of the file, just to be able to do the late binding. The third is the one you must have to make
gcc -o bar bar.c -lFoo
work. (gcc and other ELF compilers search for libFoo.so, then for libFoo.a, in /usr/lib directory)
After all, there's an explanation of the concept of shared libraries, that perhaps will make you to change your image about statically linked shared code.
Dynamic libraries are a way for several programs to share the functionalities of them (that means the code, perhaps the data also). I think you are a little disoriented, as I feel you have someway misinterpreted what a statically linked shared library means.
static linking refers to the association of a program to the shared libraries it's going to use before even launching it, so there's a hardwired link between the program and all the symbols the library has. Once you launch the program, the linking process begins and you get a program running with all of its statically linked shared libraries. The references to the shared library are resolved, as the shared library is given a fixed place in the virtual memory map of the process. That's the reason the library has to be compiled with the -fPIC option (relocatable code) as it can be placed differently in the virtual space of each program.
On the opposite, dynamic linking of shared libraries refers to the use of a library (libdl.so) that allows you to load (once the program is executing) a shared library (even one that has not been known about before), search for its public symbols, solve references, load more libraries related to this one (and solve recursively as the linker could have done) and allow the program to make calls to symbols on it. The program doesn't even need to know the library was there on compiling or linking time.
Shared libraries is a concept related to the sharing of code. A long time ago, there was UNIX, and it made a great advance to share the text segment (whit the penalty of not being able for a program to modify its own code) of a program by all instances of it, so you have to wait for it to load just the first time. Nowadays, the concept of code sharing has extended to the library concept, and you can have several programs making use of the same library (perhaps libc, libdl or libm) The kernel makes a count reference of all the programs that are using it, and it just gets unloaded when no other program is using it.
using shared libraries has only one drawback: the compiler must create relocatable code to generate a shared library as the space used by one program for it can be used for another library when we try to link it to another program. This imposes normally a restriction in the set of op codes to be generated or imposes the use of one/several registers to cope with the mobility of code (there's no mobility but several linkings can make it to be situated at different places)
Believe me, using static code just derives you to making bigger executables, as you cannot share effectively the code, but with a shared library.

How do linkers decide what parts of libraries to include?

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?
Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.
It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.
It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)
Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.
This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.
Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.
It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.
It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out
It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

Resources