GCC linked library for compile - c

Why do we have to tell gcc which library to link against when that information is already in source file in form of #include?
For example, if I have a code which uses threads and has:
#include <pthread.h>
I still have to compile it with -pthread option in gcc:
gcc -pthread test.c
If I don't give -pthread option it will give errors finding thread function definitions.
I am using this version:
gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

This may be one of the most common things that trip up beginners to C.
In C there are two different steps to building a program, compilation and linking. For the purposes of your question, these steps connect your code to two different types of files, headers and libraries.
The #include <pthread.h> directive in your C code is handled by the compiler. The compiler (actually preprocessor) literally pastes in the contents of pthread.h into your code before turning your C file into an object file.
pthread.h is a header file, not a library. It contains a list of the functions that you can expect to find in the library, what arguments they take and what they return. A header can exist without a library and vice-versa. The header is a text file, often found in /usr/include on Unix-derived systems. You can open it just like any C file to read the contents.
The command line gcc -lpthread test.c does both compilation and linking. In the old days, you would first do something like cc test.c, then ld -lpthread test.o. As you can see, -lpthread is actually an option to the linker.
The linker does not know anything about text files like C code or headers. It only works with compiled object files and existing libraries. The -l flag tells it which libraries to look in to find the functions you are using.

The name of the header has nothing to do with the name of the library. Here it's really just by the accident. Most often there are many headers provided by the library.
Especially in C++ there is usually one header per class and the library usually provides classes implementations from the same namespace. In C the headers are organized that they contain some common subset of functions - math.h contains mathematical operations, stdio.h provides IO functions etc.

They are two separate things. .h files holds the declarations, sometimes the inline function also. As we all know, every functions should have an implementation/definition to work. These implementations are kept seperately. -lpthread, for example is the library which holds the implementation of the functions declared in headers in binary form.
Separating the implementation is what people want when you don't want to share your commercial code with others
So,
gcc -pthread test.c
tell gcc to look for definitions declared in pthread.h in the libpthread. -pthread is expanded to libpthread by linker automatically

there are/were compilers that you told it where the lib directory was and it simply scanned all the files hoping to find a match. then there are compilers that are the other extreme where you have to tell it everything to link in. the key here is include simply tells the compiler to look for some definitions or even simpler to include some external file into this file. this does not necessarily have any connection to a library or object, there are many includes that are not tied to such things and it is a bad assumption. next the linker is a different step and usually a different program from the compiler, so not only does the include not have a one to one relationship with an object or library, the linker is not the compiler.

Related

Why do some system libraries require a -l option while others do not?

I recently took a class where we used pthreads and when compiling we were told to add -lpthread. But how come when using other #include <> statements for system header files, it seems that the linking of the object implementation code happens automatically? For example, if I just want to get the header file #include <stdio.h>, I don't need a -l option in compilation, the linking of that .o implementation file file just happens.
For this file
#include <stdio.h>
int main() {
return 0;
}
run
gcc -v -o simple simple.c
and you will see what, actually, gcc does. You will see that it links with libraries behind your back. This is why you don't specify system libraries explicitly.
Basic Answer: -lpthreads tells the compiler/linker to link to the pthreads library.
Longer answer
Headers tell the compiler that a certain function is going to be available (sometimes that function is defined in the header, perhaps inline) when the code is later linked. So the compiler marks the function essentially available but later during the linking phase the linker has to connect the function call to an actual function. A lot of function in the system headers are part of the "C Runtime Library" your linker automatically uses but others are provided by external libraries (such as pthreads). In the case where it is provided by an external library you have to use the '-lxxx' so the compiler/linker knows which external library to include in the process so it gets the address of the functions correctly.
Hope that helps
A C header file does not contain the implementations of the functions. It only contains function prototypes, so that the compiler could generate proper function calls.
The actual implementation is stored in a library. The most commonly used functions (such as printf and malloc) are implemented in the Standard C Library (LibC), which is implicitly linked to any executable file unless you request not to link it. The threads support is implemented in a separate library that has to be linked explicitly by passing the -pthread option to the linker.
NB: You can pass the option -pthread to the compiler that will also link the appropriate library.
The compiler links the standard C library libc.a implicitly so a -lc argument is not necessary. Pthreads is a system library but not a part of the C standard library, so must be explicitly linked.
If you were to specify -nolibc you would then need to explicitly link libc (or some alternative C library).
If all system libraries were linked implicitly, gcc would have to be have a different implementation for each system (for example pthreads is not a system library on Windows), and if a system introduced a new library, gcc would have to change in lock step. Moreover the link time would increase as each library were searched in some unknown sequence to resolve symbols. The C standard library is the one library the compiler can rely on to be provided in any particular implementation, so implicit linking is generally safe and a simple convenience.
Some library files are are searched by default, simply for convenience, basically the C standard library. Linkers have options to disable this convenience and have no default libraries (for if you are doing something unusual and don't want them). Non-default libraries, you have to tell linker to use them.
There could be a mechanism to tell what libraries are needed in the source code (include files are plain text which is just "copy-pasted" at #include line), there's no technical difficulty. But there just isn't (as far as I know, not even as non-standard compiler extension).
There are some solutions to the problem though, such as pkg-config for unixy platforms, which tackle the problem of include and library files by providing the compiler and linker options for a library easily.
the -l option is for linking against an external library in this case libpthread, against system libraries is linked by default
http://www.network-theory.co.uk/docs/gccintro/gccintro_17.html
C++: How to add external libraries

(C) How are the implementations of the stdlib functions stored and linked to in header files if the source code does not have to be provided directly?

new to using C
Header files for libraries like stdlib do not contain the actual implementation code for the functions they provide access to. I understand that the actual source text for libraries like this aren't needed to compile, but how does this work specifically? Are the implementation details for these libraries contained within the compiler?
When you use a function like printf(), including the header file essentially pastes in code for the declaration of the function, but normally the implementation code would need to be available as well.
What form is it stored in? (and where?) Is this compiler specific? Would it be possible to write custom code and reference it in this way without modifying the behavior of the compiler?
I've been searching around and found some info that is relevant but nothing specific. This could be related to not formulating the question well. Thanks.
When you link a program, the compiler will implicitly add some extra libraries to your program:
$ ls
main.c
$ cc -c main.c
$ cc main.o
$ ls
main.c main.o a.out
You can discover the extra libraries a program uses with ldd. Here, there are three libraries linked into the program, and I didn't ask for any of them:
$ ldd a.out
linux-vdso.so => (0x00...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00...)
/lib64/ld-linux-x86-64.so.2 (0x00...)
So, what happens if we link without these libraries? That's easy enough, just use the linker (ld) directly, instead of calling it through cc. When you use ld, it doesn't give you these extra libraries, so you get an error:
$ ld main.o
Undefined symbols:
"_printf", referenced from:
_main in main.o
The implementation for printf() is stored in the standard C library, which is usually just another library on your system... the only difference is that it gets automatically included into your program when you compile C.
You can use nm to find out what symbols are in a library, so I can use it to find printf() in libc:
$ nm -D /lib/x86_64-linux-gnu/libc-2.13.so | grep printf
...
000000000004e4b0 T printf
...
So, now that we know that libc has printf(), we can use -lc to tell the linker to include libc, and that will get rid of the errors about printf() being missing:
$ ld main.o -lc
There might be some other bits missing, and that's why we use cc to link our programs instead of ld: cc gives us all the default libraries.
When you compile a file you only need to promise the compiler that you have certain functions and symbols. A function call is in the compiled into a call [some_address]
The compiler will compile each C-file into object files that just have place holders for calls to functions declared in the headers. That is [some_address] does not need to be known at this point.
A number of oject files can be collected into what is known as a library.
After that it is the linkers job to look through all object files and libraries it know of and find out what the real value of all unknown [some_address] is and translate the call to, e.g. call 0x1234 if the particular function you are calling starts at 0x1234 (or it might be a relative offset from the current program pointer.
Stdlib and other library functions are implemented in an object library. A library is a collection of code that is linked with your program. By default C programs are linked against the stdlib library, which is usually provided by the operating system. Most modern operating systems use a dynamical linker. That is, your program is not linked against the library until it is executed. When it is being loaded, the linker-loader combines your code and the library code in your program's address space. You code and then make a call to the printf() code that is located in that library.
Usually a header file contains only a function prototype while the implementation is either in a separate source file or a precompiled library in the case of stdlib (and other libraries, both shipped with a compiler or available separately) the precompiled library gets linked at the end of the compilation process. (There's also a distinction between static and dynamic libraries, but I won't go into detail about that)
The implementation of standard libraries (which are shipped with a compiler) are usually compiler specific (there is a standard describing which functions have to be in a library, but the compiler programmer can decide how exactly he implements them) and it is (in theory) possible to exchange these libraries with some of your own without modifying the behaviour of the compiler (though not recommended as you would have to rewrite the entire library in order to ensure that all functions are contained).

How does a compiler find out which dynamic link library will be used in my code, if I only include headers-files, where is not describe it?

How does a compiler find out which dynamic link library will be used in my code, if I only include headers-files, where is not describe it?
#include <stdio.h>
void main()
{
printf("Hello world\n");
}
There is I only include
stdio.h
and my code is used
printf function
How it is known, in headers-files prototypes , macros and constant are described, but nothing about in which file "printf" is implement. How does then it works?
When you compile a runnable executable, you don't just specify the source code, but also a list of libraries from which undefined references are looked up. With the C standard library, this happens implicitly (unless you tell GCC -nostdinc), so you may not have been consciously aware of this.
The libraries are only consumed by the linker, not the compiler. The linker locates all the undefined references in the libraries. If the library is a static one, the linker just adds the actual machine code to your final executable. On the other hand, if the library is a shared one, the linker only records the name (and version?) of the library in the executable's header. It is then the job of the loader to find appropriate libraries at load time and resolve the missing dependencies on the fly.
On Linux, you can use ldd to list the load-time dependencies of a dynamically linked executable, e.g. try ldd /bin/ls. (On MacOS, you can use otool -L for the same purpose.)
As others have answered, the standard c library is implicitly linked. If you are using gcc you can use the -Wl,--trace option to see what the linker is doing.
I tested your example code:
gcc -Wl,--trace main.c
Gives:
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o
/tmp/ccCjfUFN.o
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/4.6/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/4.6/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crtn.o
This shows that the linker is using libc.so (and also ld-linux.so).
The library glibc is linked by default by GCC. There is no need to mention -l library when you are building your executable. Hence you find that the functions printf and others which are a part of glibc do not need any linking exclusively.
Technically your compiler does not figure out which libraries will be used. The linker (commonly ld) does this. The header files only tell the compiler what interface your library functions use and leaves it up to the linker to figure out where they are.
A source file goes a long path until it becomes an executable. Commonly
source.c -[preprocess]> source.i -[compile]> source.s -[assemble]> source.o -[link]> a.out
When you invoke cc source.c all those steps are done transparently for you in one go and the standard libraries (commonly libc.so) and executable loader (commonly crt0.o) are linked together.
Any additional libraries have to be passed as additional linker flags i.e. -lpthread.
I would say that depends on IDE or the compiler and system. Header file just contains interface information like name of function parameters it expects any attributes others and that's how compiler first convert your code to an intermediate object file.
After that comes linking where in code for printf is getting added to the executable either through static library or dynamic library.
Functions and other facilities like STL are part of C/C++ so they are either delivered by compiler or system. e.g on Solaris there is no debug version of C library unless you are using gcc. But on Visual Studio you have debug version msvcrt.dll and you can also link C library statically.
In short the answer is that code for printf and other functions in C library are added by compiler at link time.

Does program need additional symbols from .so shared library except those declared in header file?

In C programming, I thought that a object file can be successfully linked with a .so file as long as the .so file offers all symbols which have been declared in the header file.
Suppose I have foo.c, bar.h and two libraries libbar.so.1 and libbar.so.2. The implementation of libbar.so.1 and libbar.so.2 is totally different, but I think it's OK as long as they both offers functions declared in bar.h.
I linked foo.o with libbar.so.1 and produced an executable: foo.bin. This executable worked when libbar.so.1 is in LD_LIBRARY_PATH.(of course a symbolic link is made as libbar.so) However, when I change the symbolic link to libbar.so.2, foo.bin could not run and complainted this:
undefined symbol: _ZSt4cerr
libbar.so.1 is a c++ built library, while libbar.so.2 is a c built library. I don't understand why foo.bin needs those c++ related symbols only meaningful in libbar.so.1 itself, since foo.bin is built upon pure c code foo.c.
_ZSt4cerr is obviously a mangled C++ name. You may need to check if you are using the right compiler (gcc/g++, i know it sounds stupid, but i happened to run into such confusion ;) ), and if there are any macros in the bar.h file that could have referenced cerr.
You must demangle c++ name before searching. For gcc there is a c++filt utility:
$ c++filt
_ZSt4cerr
std::cerr
It is just standard error file stream.
You probably just forgot to link the whole program with the C++ standard library.
The statement in the question is just wrong. You say, 'the header file.' There is no such thing as 'the (one and only) header file.' If you mean 'the header file declaring a certain C++ class', well, that class might inherit from other classes. Or it might use exceptions. Or RTTI. In which case, by default, the .so containing the code that goes with it will contain 'hanging undefined symbols'. By default, the expectation is that the 'main' program is in C++, and it links to the C++ runtime.
It is possible to create a self-contained .so, but you have to do extra work to create it. You might need to use -Bsymbolic, or specify some -l libraries when linking it, or both. This area is not well-documented and generally required some archaeology.

Two basic question about compiling and libraries

I have two semi-related questions.
My first question: I can call functions in the standard library without compiling the entire library by just:
#include <stdio.h>
How would I go about doing the same thing with my header files? Just "including" my plaintext header files obviously does not work.
#include "nameofmyheader.h"
Basically, how can I create a library that other files can call?
Second question: Suppose I have a program that is split into 50 c files and a header file. What is the proper way to compile it besides:
cc main.c 1.h 1.c 2.c 3.c 4.c 5.c 6.c 7.c /*... and so on*/
Please correct any misconceptions I am having. I'm totally lost here.
First, you're a bit confused as to what happens with an #include. You never "compile" the standard library. The standard library is already compiled and is sitting in library files (.dll and .lib files on Windows, .a and .so on Linux). What the #include does is give you the declarations needed to link to the standard library.
The first thing to understand about #include directives is that they are very low-level. If you have programmed in Java or Python, #includes are much different from imports. Imports tell the compiler at a high level "this source file requires the use of this package" and the compiler figures out how to resolve that dependency. An #include in C directive says "take the entire contents of this file and literally paste it in right here when compiling." In particular, #include <stdio.h> brings in a file that has the forward declarations for all of the I/O functions in the standard library. Then, when you compile your code, the compiler knows how to make calls to those functions and check them for type-correctness.
Once your program is compiled, it is linked to the standard library. This means that your linker (which is automatically invoked by your compiler) will either cause your executable to make use of the shared standard library (.dll or .so), or will copy the needed parts of the static standard library (.lib or .a) into your executable. In neither case does your executable "contain" any part of the standard library that you do not use.
As for creating a library, that is a bit of a complicated topic and I will leave that to others, particularly since I don't think that's what you really want to do based on the next part of your question.
A header file is not always part of a library. It seems that what you have is multiple source files, and you want to be able to use functions from one source file in another source file. You can do that without creating a library. All you need to do is put the declarations for things foo.c that you want accessible from elsewhere into foo.h. Declarations are things like function prototypes and "extern" variable declarations. For example, if foo.c contains
int some_global;
void some_function(int a, char b)
{
/* Do some computation */
}
Then in order to make these accessible from other source files, foo.h needs to contain
extern int some_global;
void some_function(int, char);
Then, you #include "foo.h" wherever you want to use some_global or some_function. Since headers can include other headers, it is usual to wrap headers in "include guards" so that declarations are not duplicated. For example, foo.h should really read:
#ifndef FOO_H
#define FOO_H
extern int some_global;
void some_function(int, char);
#endif
This means that the header will only be processed once per compilation unit (source file).
As for how to compile them, never put .h files on the compiler command line, since they should not contain any compile-able code (only declarations). In most cases it is perfectly fine to compile as
cc main.c 1.c 2.c 3.c ... [etc]
However if you have 50 source files, it is probably a lot more convenient if you use a build system. On Linux, this is a Makefile. On windows, it depends what development environment you are using. You can google for that, or ask another SO question once you specify your platform (as this question is pretty broad already).
One of the advantages of a build system is that they compile each source file independently, and then link them all together, so that when you change only one source file, only that file needs to be re-compiled (and the program re-linked) rather than having everything re-compiled including the stuff that didn't get changed. This makes a big time difference when your program gets large.
You can combine several .c files to a library. Those libraries can be linked with other .c files to become the executable.
You can use a makefile to create a big project.
The makefile has a set of rules. Each rule describes the steps needed to create one piece of the program and their dependencies with other pieces or source files.
You need to create a shared library, the standard library is a shared library that is implicitly linked in your program.
Once you have your shared library you can use the .h files and just compile the program with -lyourlib wich is implicit for the libc
Create one using:
gcc -shared test.c -o libtest.so
And then compile your program like:
gcc myprogram.c -ltest -o myprogram
For your second question I advise you to use Makefiles
http://www.gnu.org/software/make/
The standard library is already compliled and placed on your machine ready to get dynamically linked. This means that the library is dynamically loaded when needed by a program. Compare this to a static library which gets compiled INTO your program when you run the compiler/linker.
This is why you need to compile your code and not the standard library code. You could build a dynamic (shared) library yourself.
For reference, #include <stdio.h> does not IMPORT the standard library. It just allows the compile and link to see the public interface of the library (To know what functions are used, what parameters they take, what types are defined, what sizes they are, etc).
Dynamic Loading
Shared Library
You could split your files up into modules, and create shared libraries. But generally as projects get bigger you tend to need a better mechanism to build your program (and libraries). Rather than directly calling the compiler when you need to do a rebuild you should use a make program or a complete build system like the GNU Build System.
If you really want it to be as simple as just including a .h file, all of your "library" code needs to be in the .h file. However, in this scenario, someone can only include your .h file into one and only one .c file. That may be ok, depending on how someone will use your "library".

Resources