gnu linker and underscore - linker

Linking (using GNU linker) two modules:
module A calls external function sprintf <-- no ending underscore
module B provides sprintf_ <-- ending undesrcore
The reference sprintf is connected with reference sprintf_
I was expecting an undefined reference. Are there rules about underscores in relation with the GNU linker?

Related

Why is there a need to link with `printf.o` if it's already defined inside `stdio.h`?

As far as I understand when we include stdio.h the entire file is copied into the text of our program.(basically we prepend it) Then when we do a function call for printf() why does it have to be linked first?
Is it that stdio.h contains just the definitions of those functions and the compiler finds the compiled executable object file for the function that we invoke for example printf().
I've read about this in a few places but still kinda not clear.
Header files like stdio.h generally just contain a declaration that defines the name of the function, the types of its arguments, and its return value. This is enough for the compiler to generate calls to the function, but it is not a definition. The actual code that implements the function is going to be in a library (with an extension like .a or .o or .lib).
Nope. printf() is included in libc.so (this is where the printf() function resides), which is the C standard library. The compiler includes automatically -lc as option, when it calls the linker, so you don't need to add that option. Only in case you link your object files calling the linker ld directly, is when you need to include the option yourself (and some other files that form the C runtime, the linker doesn't know that you are linking a C source program, so it doesn't know what files to link with your modules, but the compiler does) See the documentation of the compiler you use, or just use the -v option when building the executable, to see the command line the compiler uses to call the linker, and you will see all the objects and libraries the compiler requires for every C program.

Why is it possible to redefine C library functions?

I noticed that if I write a function named getline, this function will be used if I invoke it, even if I #include <stdio.h>, but if I don't write such a function, the one from stdio.h will be used.
I expected instead to get a linker error, the same as if I had done the following:
foo.c:
int f() { return 0; }
main.c:
int f() { return 1; }
int main() { return f(); }
Compile:
$ gcc -c foo.c
$ gcc -c main.c
$ gcc foo.o main.o
/usr/bin/ld: main.o: in function `f':
main.c:(.text+0x0): multiple definition of `f'; foo.o:foo.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
The linker error makes sense to me; when the linker attempts to combine the object files into a single binary, it doesn't know how to resolve the invocation of f(); should it use foo.o's f() or main.o's f()?
But then why don't I get such a linker error when I write my own versions of getline or other C library functions?
This came up because I noticed that when compiling with -std=c99, gcc gives me a implicit-function-declaration warning for using getline. I can make an explicit function prototype, and it works correctly, but this implies that glibc's getline is being linked, so I tested what happens if I write my own getline, and if I do, the linker uses it instead and produces no error... The same appears to be true for other C library functions. Why is this? Why don't I get a linker error instead?
Linkers process library files differently than object files. The following discusses typical behavior for linkers. Details may vary with specific linkers and command-line switches or other settings.
When a linker processes an object file, it includes the entire object file in the output file it is building. As it is doing this, it builds a list of symbols that the object files use (refer to) but that are not defined yet.
A library file consists of multiple object modules inside a containing file. When a linker processes a library file, it examines each module in the library file and compares the symbols that module defines to that list of symbols that are needed but not yet defined. When it finds such a module, the linker includes that module in the output file. (The linker may also go back to earlier modules in the same library file, in case a later module uses a symbol that an earlier one defines.)
Any modules in the library file that do not provide a needed symbol are not needed in the output file, so the linker does not include them.
A consequence of this is that, if a same symbol is defined more than once in the object files, there will be multiple definitions because they are both built into the output file. However, if a symbol is defined once in the object files and once in the library, the one in the library will not be used because, when the linker considers the module it is in, that symbol will not be on the list of needed symbols, and the linker will not include it in the output file. So the output file ends up with just one definition of the symbol, the one from the object modules.
There are some complications to this. Suppose a module in a library defines both sin and cos, and an object module defines sin and uses both sin and cos. When the linker processes the object module, it will note that sin and cos are both used. The reference to sin will be satisfied by the object module, but cos is still needed. Then, when the linker processes the library, it will find cos and include that module. But that module also defines sin, so there will be two definitions of sin in the output file, and the linker will complain. So you can get multiple-definition errors from library modules this way.
Another complication is that the order of processing matters. If the linker first processes an object module that needs getline, and then a library module that defines getline, and then an object module that defines getline, the library module will be included in the output file (because getline was needed when the linker processed the library), and the object module that defines getline will also be included (because the linker includes all object files). So the output will have multiple definitions of getline, and the linker will complain. This is one reason why libraries are generally processed last, so that all object modules are processed first, and only things that are needed from libraries are taken.
In spite of this linker behavior, you cannot rely on defining your own versions of standard C routines. Compilers may have built-in knowledge about how the routines are specified by the C standard, and they may replace calls to those routines with other code. If you do need to provide your own version of a standard routine, the compiler may have a switch to disable its special treatment of that routine. For example, GCC has -fno-builtin-function, where function is replaced with a particular name, to tell it to disable special knowledge of a function.

Where are declaration and definition stored?

For a predefined function where are the declaration and definition stored?
And is the declaration stored in libraries? If so, then why it is named library function?
This is an imprecise question. The best answers we can give are:
The declaration of standard library functions can best be thought of as being stored in their header files.
Depending on how you want to think about it, the definition of standard library files is either in the source files for those libraries (which may be invisible to you, a trade secret of your compiler vendor), or in the library files themselves (.a, .so, .lib, or .dll).
These days, knowledge of standard library functions is typically built in to the compiler, also. For example, if I write the old classic int main() { printf("Hello, world!\n"); }, but without any #include directives, my compiler says, "warning: implicitly declaring library function 'printf'" and "include the header <stdio.h>".
There are two sides of this story:
The code that calls a library/external function: The compiler generates a reference in your compiled module that encodes which function prototype you expect to exist elsewhere.
The pre-compiled library files against which your code must be linked: A library file contains both the coded prototypes of its functions as well as the actual compiled binary (i.e. the definition/implementation) for these function.
When your code uses an external function, the compiler generates a reference to this function that it assumes will be resolved later during the linking phase.
During the linking process lists of function references are build up. The linker expects to find the 'definition'/implementation of each of the used references.
The header file contains the declaration of built-in functions and the library contains the definition of the functions.
The name library is because, in my opinion, as the actual library which contains books, these libraries contain the classes, functions, variables etc.

why gcc can automatically tag a symbol as weak

We have built our code using gcc4.1.2, and we have used function "lstat64" that is defined in the "sys/stat.h" system header file and also defined in a third party library that we use.
When we "nm" our executable, we find that:
W lstat64
My question Is: why gcc marked it as a weak function?
Also, we have ported our code to gcc4.4.4, we found that the new gcc did not marked the function as "weak", it marked it as undefined?
Why this change in behavior?
As per the GCC documentation:
weak
The weak attribute causes the declaration to be emitted as a weak symbol rather than a global. This is primarily useful in defining library functions which can be overridden in user code, though it can also be used with non-function declarations. Weak symbols are supported for ELF targets, and also for a.out targets when using the GNU assembler and linker.
In your case lstat64 was probably marked as weak in GCC 4.1.2 because it would then not conflict with the third party library function. GCC probably wanted these external functions to have precedence.
But in a later version, GCC would have wanted its own version of lstat64 to have precedence.

What standard library function does libc.a contain?

When using gcc under Linux, one does not need to add command-line options to use standard library functions like printf. In book An Introduction to GCC, it explains "The C standard library itself is stored in ‘/usr/lib/libc.a’ and contains functions specified in the
ANSI/ISO C standard, such as ‘printf’—this library is linked by default for every C program."
But one has to add -lm in the command-line to use standard library functions declared in math.h, since libm.a is not linked against in default.
So which standard library functions are included in libc.a, thus do not require to link other library files. And other than libm.a, are there any other standard library functions that need to explicitly add library files to link against, and what are the file names of the library?
libc and libm both handle all ANSI/ISO functions. Beyond that, Linux and UNIX systems follow POSIX, which includes libpthread (usually linked in using the -pthread option, not explicitly linking in the library), as well as libiconv which may be included in libc. Additional libraries in POSIX include curses and libutil for miscellaneous functions.

Resources