Why create a .a file from .o for static linking? - c

Consider this code:
one.c:
#include <stdio.h>
int one() {
printf("one!\n");
return 1;
}
two.c:
#include <stdio.h>
int two() {
printf("two!\n");
return 2;
}
prog.c
#include <stdio.h>
int one();
int two();
int main(int argc, char *argv[])
{
one();
two();
return 0;
}
I want to link these programs together. So I do this:
gcc -c -o one.o one.c
gcc -c -o two.o two.c
gcc -o a.out prog.c one.o two.o
This works just fine.
Or I could create a static library:
ar rcs libone.a one.o
ar rcs libtwo.a two.o
gcc prog.c libone.a libtwo.a
gcc -L. prog.c -lone -ltwo
So my question is: why would I use the second version - the one where I created a ".a" files - rather than linking my ".o" files? They both seem to be statically linking, so is there an advantage or architectural difference in one vs another?

Typically libraries are collections of object files that can be used in multiple programs.
In your example there is no advantage, but you might have done:
ar rcs liboneandtwo.a one.o two.o
Then linking your program becomes simpler:
gcc -L. prog.c -loneandtwo
It's really a matter of packaging. Do you have a set of object files that naturally form a set of related functionality that can be reused in multiple programs? If so, then they can sensibly be archived into a static library, otherwise there probably isn't any advantage.
There is one important difference in the final link step. Any object files that you linked will be included in the final program. Object files that are in libraries are only included if they help resolve any undefined symbols in other object files. If they don't, they won't be linked into the final executable.

The difference would be in the size of the executable, although maybe not for your example.
When linking to a library, only the bits that are used by your executable are incorporated. When linking an object file, you take the whole thing.
For example, if your executable had to include every math function in the math library when you only use one, it would be much bigger than it needed to be and contain a lot of unused code.
It is interesting to contrast this with the dynamic linking model of Windows. There, the OS has to load all the Dlls (dynamically linked libraries) entirely that your executable uses, which could lead to bloat in RAM. The advantage of such a model is that your executable is itself smaller, and the linked Dlls may already be in memory used by some other executable, so they don't need to be loaded again.
In static linking, the library functions are loaded separately for each executable.

Technically, the result is exactly the same. Usually, you create libraries for utility functions, so instead of feeding the linker with dozens of object files, you just have to link the library.
BTW, it absolutely makes no sense to create a .a file that contains just one .o file.

You can put a collection of files in an archive (.a) file for later reuse. The standard library is a good example.
Sometimes it makes sense to organize big projects into libraries.

The primary advantage is when you have to link, you can just specify one library instead of all the separate object files. There's also a minor advantage in managing the files, getting to deal with one library instead of a bunch of object files. At one time, this also gave a significant savings in disk space, but current hard drive prices make that less important.

Whenever I am asked this question(by freshers in my team), "why (or sometimes even a 'what is') a .a?", I use the below answer that uses the .zip as an analogy.
"A dotAy is like a zip file of all the dotOhs which you would want to link while building your exe/lib. Savings on disk space, plus one need not type names of all dotOhs involved."
so far, this has seemed to make them understand. ;)

Related

why do we need the shared library during compile time

Why we need the presence of the shared library during the compile time of my executable? My reasoning is that since shared library is not included into my executable and is loaded during the runtime, it is not supposed to be needed during compile time. Or Am I missing something?
#include<stdio.h>
int addNumbers(int, int); //prototype should be enough, no?
int main(int argc, char* argv[]){
int sum = addNumbers(1,2);
printf("sum is %d\n", sum);
return 0;
}
I had the libfoo.so in my current dir but I changed its name to libfar.so to find that shared lib is needed at compile or it doesn't compile.
gcc -o main main.c -L. -lfoo gives main.c:(.text+0x28): undefiend reference to 'addNumber'
I think it should be enough to only have the name of the shared library. The shared library itself is not needed since it is found in the LD_LIBRARY_PATH and loaded dynamically at runtime. Is there something else needed other than the name of the shared lib?
Nothing is needed at compile time, because C has a notion of separate compilation of translation units. But once all the different sources have been compiled, it is time to link everything together. The notion of shared library is not present in the standard but is it now a common thing, so here is how a common linker proceeds:
it looks in all compiled modules for identifiers with external linkage either defined or only declared
it looks in libraries (both static and dynamic) for identifiers already used and not defined. It then links the modules from static libraries, and stores references from dynamic libraries. But at least on Unix-likes, it needs to access the shared library for potential required (declared and not defined) identifiers in order to make sure they are already defined or can be found in other linked libraries be them static or dynamic
This produces the executable file. Then at load time, the dynamic loader knows all the dynamic modules that are required and loads them in memory (if they are not already there) along with the actual executable and builds a (virtual) memory map
gcc -o main main.c -L. -lfoo
This command does (at least) two steps: compile main.c into an object file and link all resources into an executable main. The error you see is from the last step, the linker.
The linker is responsible for generating the final executable machine code. It requires the shared object library because it needs to generate the machine code which loads it and executes any functions used in it.

GCC linked library for compile

Why do we have to tell gcc which library to link against when that information is already in source file in form of #include?
For example, if I have a code which uses threads and has:
#include <pthread.h>
I still have to compile it with -pthread option in gcc:
gcc -pthread test.c
If I don't give -pthread option it will give errors finding thread function definitions.
I am using this version:
gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
This may be one of the most common things that trip up beginners to C.
In C there are two different steps to building a program, compilation and linking. For the purposes of your question, these steps connect your code to two different types of files, headers and libraries.
The #include <pthread.h> directive in your C code is handled by the compiler. The compiler (actually preprocessor) literally pastes in the contents of pthread.h into your code before turning your C file into an object file.
pthread.h is a header file, not a library. It contains a list of the functions that you can expect to find in the library, what arguments they take and what they return. A header can exist without a library and vice-versa. The header is a text file, often found in /usr/include on Unix-derived systems. You can open it just like any C file to read the contents.
The command line gcc -lpthread test.c does both compilation and linking. In the old days, you would first do something like cc test.c, then ld -lpthread test.o. As you can see, -lpthread is actually an option to the linker.
The linker does not know anything about text files like C code or headers. It only works with compiled object files and existing libraries. The -l flag tells it which libraries to look in to find the functions you are using.
The name of the header has nothing to do with the name of the library. Here it's really just by the accident. Most often there are many headers provided by the library.
Especially in C++ there is usually one header per class and the library usually provides classes implementations from the same namespace. In C the headers are organized that they contain some common subset of functions - math.h contains mathematical operations, stdio.h provides IO functions etc.
They are two separate things. .h files holds the declarations, sometimes the inline function also. As we all know, every functions should have an implementation/definition to work. These implementations are kept seperately. -lpthread, for example is the library which holds the implementation of the functions declared in headers in binary form.
Separating the implementation is what people want when you don't want to share your commercial code with others
So,
gcc -pthread test.c
tell gcc to look for definitions declared in pthread.h in the libpthread. -pthread is expanded to libpthread by linker automatically
there are/were compilers that you told it where the lib directory was and it simply scanned all the files hoping to find a match. then there are compilers that are the other extreme where you have to tell it everything to link in. the key here is include simply tells the compiler to look for some definitions or even simpler to include some external file into this file. this does not necessarily have any connection to a library or object, there are many includes that are not tied to such things and it is a bad assumption. next the linker is a different step and usually a different program from the compiler, so not only does the include not have a one to one relationship with an object or library, the linker is not the compiler.

Modular programming and compiling a C program in linux

So I have been studying this Modular programming that mainly compiles each file of the program at a time. Say we have FILE.c and OTHER.c that both are in the same program. To compile it, we do this in the prompt
$gcc FILE.c OTHER.c -c
Using the -c flag to compile it into .o files (FILE.o and OTHER.o) and only when that happens do we translate it (compile) to executable using
$gcc FILE.o OTHER.o -o
I know I can just do it and skip the middle part but as it shows everywhere, they do it first and then they compile it into executable, which I can't understand at all.
May I know why?
If you are working on a project with several modules, you don't want to recompile all modules if only some of them have been modified. The final linking command is however always needed. Build tools such as make is used to keep track of which modules need to be compiled or recompiled.
Doing it in two steps allows to separate more clearly the compiling and linking phases.
The output of the compiling step is object (.o) files that are machine code but missing the external references of each module (i.e. each c file); for instance file.c might use a function defined in other.c, but the compiler doesn't care about that dependency in that step;
The input of the linking step is the object files, and its output is the executable. The linking step bind together the object files by filling the blanks (i.e. resolving dependencies between objets files). That's also where you add the libraries to your executable.
This part of another answer responds to your question:
You might ask why there are separate compilation and linking steps.
First, it's probably easier to implement things that way. The compiler
does its thing, and the linker does its thing -- by keeping the
functions separate, the complexity of the program is reduced. Another
(more obvious) advantage is that this allows the creation of large
programs without having to redo the compilation step every time a file
is changed. Instead, using so called "conditional compilation", it is
necessary to compile only those source files that have changed; for
the rest, the object files are sufficient input for the linker.
Finally, this makes it simple to implement libraries of pre-compiled
code: just create object files and link them just like any other
object file. (The fact that each file is compiled separately from
information contained in other files, incidentally, is called the
"separate compilation model".)
It was too long to put in a comment, please give credit to the original answer.

How to link two files in C

I am currently working on a class assignment. The assignment is to create a linked list in c. But because we it's a class assignment we have some constraints:
We have a header file that we cannot modify.
We have a c file that is the linkedlist
We have a c file that is just a main method just to test the linkedlist
the header file has a main method defined, so when I attempt to build the linkedlist it fails because there is no main method. What should I do to resolve the issue?? Import the test file (this causes another error)?
I'm assuming your three files are called header.h, main.c, and linkedlist.c
gcc main.c linkedlist.c -o executable
This will create an executable binary called "executable"
Note this also assumes you're using gcc as a compiler.
Like most languages, C supports modules. What I assume your assignment requires is compiling a module. Modules, unlike full programs, lack entry points. Roughly speaking, they are collections of functions, in the manner of a library. When compiling a module, no linking is made.
You would compile a module like this: gcc -c linkedlist.c -> this would actually produce linkedlist.o, which is a module. Try executing this linkedlist.o (after changing its mode to executable, since it won't be so by default). The reason you fail to execute this module is, partly, because it is not in the proper format to be executed. Ones of the reasons it is not so is it lacks entry point (what we know as 'main') and linkage. Your assignment seems to provide a test 'main.c', if you wanted to use it, you would only have to link the 'main.c' (actually compiled into main.o) with linkedlist.o . To actually do that, simply type in gcc -o name_of_your_program main.c linkedlist.o. In fact, what is being done here is that your compiler first compiles main.c into a main.o module, then links the 2 modules together under the name you have given it with the -o option, but the compiler is pretty smart and needs nothing explicit about the steps he needs to take. Now if you wanted to know more about this stuff, you'd have to try and learn about how compilers do what they do. Google can help you with that more than I ever could. Good luck.

Two basic question about compiling and libraries

I have two semi-related questions.
My first question: I can call functions in the standard library without compiling the entire library by just:
#include <stdio.h>
How would I go about doing the same thing with my header files? Just "including" my plaintext header files obviously does not work.
#include "nameofmyheader.h"
Basically, how can I create a library that other files can call?
Second question: Suppose I have a program that is split into 50 c files and a header file. What is the proper way to compile it besides:
cc main.c 1.h 1.c 2.c 3.c 4.c 5.c 6.c 7.c /*... and so on*/
Please correct any misconceptions I am having. I'm totally lost here.
First, you're a bit confused as to what happens with an #include. You never "compile" the standard library. The standard library is already compiled and is sitting in library files (.dll and .lib files on Windows, .a and .so on Linux). What the #include does is give you the declarations needed to link to the standard library.
The first thing to understand about #include directives is that they are very low-level. If you have programmed in Java or Python, #includes are much different from imports. Imports tell the compiler at a high level "this source file requires the use of this package" and the compiler figures out how to resolve that dependency. An #include in C directive says "take the entire contents of this file and literally paste it in right here when compiling." In particular, #include <stdio.h> brings in a file that has the forward declarations for all of the I/O functions in the standard library. Then, when you compile your code, the compiler knows how to make calls to those functions and check them for type-correctness.
Once your program is compiled, it is linked to the standard library. This means that your linker (which is automatically invoked by your compiler) will either cause your executable to make use of the shared standard library (.dll or .so), or will copy the needed parts of the static standard library (.lib or .a) into your executable. In neither case does your executable "contain" any part of the standard library that you do not use.
As for creating a library, that is a bit of a complicated topic and I will leave that to others, particularly since I don't think that's what you really want to do based on the next part of your question.
A header file is not always part of a library. It seems that what you have is multiple source files, and you want to be able to use functions from one source file in another source file. You can do that without creating a library. All you need to do is put the declarations for things foo.c that you want accessible from elsewhere into foo.h. Declarations are things like function prototypes and "extern" variable declarations. For example, if foo.c contains
int some_global;
void some_function(int a, char b)
{
/* Do some computation */
}
Then in order to make these accessible from other source files, foo.h needs to contain
extern int some_global;
void some_function(int, char);
Then, you #include "foo.h" wherever you want to use some_global or some_function. Since headers can include other headers, it is usual to wrap headers in "include guards" so that declarations are not duplicated. For example, foo.h should really read:
#ifndef FOO_H
#define FOO_H
extern int some_global;
void some_function(int, char);
#endif
This means that the header will only be processed once per compilation unit (source file).
As for how to compile them, never put .h files on the compiler command line, since they should not contain any compile-able code (only declarations). In most cases it is perfectly fine to compile as
cc main.c 1.c 2.c 3.c ... [etc]
However if you have 50 source files, it is probably a lot more convenient if you use a build system. On Linux, this is a Makefile. On windows, it depends what development environment you are using. You can google for that, or ask another SO question once you specify your platform (as this question is pretty broad already).
One of the advantages of a build system is that they compile each source file independently, and then link them all together, so that when you change only one source file, only that file needs to be re-compiled (and the program re-linked) rather than having everything re-compiled including the stuff that didn't get changed. This makes a big time difference when your program gets large.
You can combine several .c files to a library. Those libraries can be linked with other .c files to become the executable.
You can use a makefile to create a big project.
The makefile has a set of rules. Each rule describes the steps needed to create one piece of the program and their dependencies with other pieces or source files.
You need to create a shared library, the standard library is a shared library that is implicitly linked in your program.
Once you have your shared library you can use the .h files and just compile the program with -lyourlib wich is implicit for the libc
Create one using:
gcc -shared test.c -o libtest.so
And then compile your program like:
gcc myprogram.c -ltest -o myprogram
For your second question I advise you to use Makefiles
http://www.gnu.org/software/make/
The standard library is already compliled and placed on your machine ready to get dynamically linked. This means that the library is dynamically loaded when needed by a program. Compare this to a static library which gets compiled INTO your program when you run the compiler/linker.
This is why you need to compile your code and not the standard library code. You could build a dynamic (shared) library yourself.
For reference, #include <stdio.h> does not IMPORT the standard library. It just allows the compile and link to see the public interface of the library (To know what functions are used, what parameters they take, what types are defined, what sizes they are, etc).
Dynamic Loading
Shared Library
You could split your files up into modules, and create shared libraries. But generally as projects get bigger you tend to need a better mechanism to build your program (and libraries). Rather than directly calling the compiler when you need to do a rebuild you should use a make program or a complete build system like the GNU Build System.
If you really want it to be as simple as just including a .h file, all of your "library" code needs to be in the .h file. However, in this scenario, someone can only include your .h file into one and only one .c file. That may be ok, depending on how someone will use your "library".

Resources