When we compile code, an object file is generated. From that object file, an executable file is generated in the linking process.
Why do we need an object file? What is the use of an object file? Can't it be possible that an executable file is generated directly? After all, we are using an executable file to run the program.
Object files are what the linker uses to build complete executables (or libraries).
You can usually have your compiler output an executable "directly", the syntax will depend on the compiler. For instance with GCC:
gcc foo.c bar.c ...
will produce an executable and no intermediate object file will remain (but one will probably have been generated - and subsequently deleted).
Object files are used to make an incremental build. You compile each source file (or group of source files) to object files, then link all of them together in an executable. This allows you to only re-compile the source files that have changed since the last time you built, saving potentially a lot of time.
Or you could use the same object files to link different executables (re-use parts of your build to generate both an executable and a shared library for instance), again saving time and resources compared to compiling everything every time.
Object files aren't "needed" from a theoretical point of view. They're just very practical (and actually necessary technically with some (most?) toolchains, being the things the assembler knows how to produce and the linker knows how to link).
Related
I'm building a shared library. I need only one function in it to be public.
The shared library is built from a few object files and several static libraries. The linker complains that everything should be build with -fPIC. All the object files and most static libraries were built without this option.
This makes me ask a number of questions:
Do I have to rebuild every object file and every static library I need for this dynamic lib with -fPIC? Is it the only way?
The linker must be able to relocate object files statically, during linking. Correct? Otherwise if object files used hardcoded constant addresses they could overlap with each other. Shouldn't this mean that the linker has all the information necessary to create the global offset table for each object file and everything else needed to create a shared library?
Should I always use -fPIC for everything in the future as a default option, just in case something may be needed by a dynamic library some day?
I'm working on Linux on x86_64 currently, but I'm interested in answers about any platform.
You did not say which platform you use but on Linux it's a requirement to compile object files that go into your library as position independent code (PIC). This includes static libraries at least in practice.
Yes. See load time relocation of shared libraries and position independent code pic in shared libraries.
I only use -fPIC when compiling object files that go into libraries to avoid unecessary overhead.
i am a little confused about how shared library and the OS works.
1st question : how the OS manages shared libraries?, how they are specified uniquely? by file name or some other(say an ID) things? or by full path?!
2nd question : i know first when we compile and link codes, the linker need to access the shared library(.so) to perform linking, then after this stage when we execute the compiled program the OS loads the shared library and this libraries may be in different locations(am I wrong?) BUT i do not understand how the OS knows where to look for shared library, is library information (name? path? or what?!) coded in the executable ?
When compiling a program, libraries (other than the language runtime) must be explicitly specified in the build, otherwise they will not be included. There are some standard library directories, so for example you can specify -lfoo, and it will automatically look for libfoo.a or libfoo.so in the various usual directories like /usr/lib, /usr/local/lib etc.
Note, however, that a name like libfoo.so is usually a symlink to the actual library file name, which might be something like libfoo.so.1. This way, if there needs to be a backward-incompatible change to the ABI (the layout of some structure might change, say), then the new version of the library becomes libfoo.so.2, and binaries linked against the old version remain unaffected.
So the linker follows the symlink, and inserts a reference to the versioned name libfoo.so.1 into the executable, instead of the unversioned name libfoo.so. It can also insert the full path, but this is usually not done. Instead, when the executable is run, there is a system search path, as configured in your systemwide /etc/ld.so.conf, that is used to find the library.
(Actually, ld.so.conf is just the human-readable source for your library search paths; this is compiled into binary form in /etc/ld.so.cache for speed, using the ldconfig command. This is why you need to run ldconfig every time you make changes to the shareable libraries on your system.)
That’s a very simplified explanation of what is going on. There is a whole lot more that is not covered here. Here and here are some reference docs that might be useful on the build process. And here is a description of the system executable loader.
Part of my Go program relies on a very large C codebase using import "C" that takes a few minutes to compile. Is there any way to precompile per-se that C library or create a branch of my Go program that will be precompiled along with the C code so that each time I compile the main program I don't have to wait for the entire C library to re-compile each time?
Instead of importing the entire C source code, you can link it with compiled object files and header files. Refer to https://golang.org/cmd/cgo/ which covers how to use the LDFLAGS argument for cgo.
There are other documents online which cover how to compile C code into object files (.a and .o files) such as this one. You should also refer to documentation in the library you're using, or its Makefile as it will likely already have instructions to compile it into object files that can be linked.
If the library that has import "C", and its source isn't being modified, you can also go get it, (or perhaps go install it) which will store its compiled object files in your $GOPATH/pkg, making compilation of other Go programs that import it faster.
I've been programming in C for a while and i wondered why is important to separate this processes (Compile and Linking)?
Can someone explain please?
It is useful in order to decrease rebuilt time. If you change just one source file - it is often no need to recompile whole project but only one or few files.
Because, compilation is responsible for transforming the source code of every single source code file to a corresponding object code. That's it. So the compiler doesn't have to care about your external symbols (like the libraries and extern variables).
Linking is responsible for finding those references and then producing a single binary as if your project was written as a single source code file. (I also recommend that you should refer to wikipedia linking page to know the differnce between static and dynamic linking)
If you happen to use the tool Make, you will see that it doesn't recompile every file whenever you invoke make, it finds what files have been modified since the last build and then recompiles them only. Then the linking process is invoked. That is a big time saver when you deal with big projects (e.g. linux kernel).
It's probably less important these days than it was once.
But there was a time when compiling a project could take literally days - we used to do a "complete build" over a weekend back in the 1980s. Just parsing the source code of a single file was a fairly big deal requiring significant amounts of time and memory, so languages were designed so that their modules (source files) could be processed in isolation.
The result was "object files" - .obj (DOS/Windows/VMS) and .o (unix) files - which contain the relocatable code, the static data, and the lists of exports (objects we've defined) and the imports (objects we need). The linking stage glues all this together into an executable, or into an archive (.lib, .a, .so, .dll files etc) for further inclusion.
Making the expensive compilation task operate in isolation led the way to sophisticated incremental build tools like make, which produced a significant increase in programmer productivity - still critical for large C projects, like the Linux kernel.
It also, usefully, means that any language that can be compiled into an object file can be linked together. So, with a little effort, one can link C to Fortran to COBOL to C++ and so on.
Many languages developed since those days have pushed the boundaries of what can be stored in object files. The C++ template system requires special handling, and overloaded methods don't quite fit either as plain .o files don't support multiple functions with the same name (see C++ name mangling). Java et al uses a completely different approach, with custom code file formats and a "native code" invocation mechanism that glues onto DLLs and shared object files.
In practice, it is not important at all. Especially for simpler programs, both steps are performed with one program call such as
gcc f1.c f2.c f3.c f4.c -o program
which creates the executable program from these source files.
But the fact remains that these are separate processes and in some cases it is worth noticing this.
I've worked on systems where it takes two days to compile them. You don't want to make a small change then have to wait 2 days to test.
We know that when linking a static library, the linker copies the relevant code from .a file to the executable binary file. And when linking a dynamic library, the linker just copies the addresses of functions it found in .lib file (under Windows) into the binary file, and the functions themselves are not copied. At runtime, the OS loads the dll, and the program runs the code according to the addresses. The question is, could I use the .lib file and dll to link the dynamic library statically? The linker reads addresses from the .lib, and then copies relevant code from the dll to binary file. It should work, right?
I have no clue whether your idea could work, but do note that -- on Windows, with Visual Studio at least -- a static library is something very different from a DLL.
With VS, a static library is basically just an object file container, that is, you have a .lib file, but that .lib file is just a container for all the .obj files that the compiler produced for the project.
The important thing here is that the .objcode in the static library hasn't gone through the linking stage yet, no linker has been involved.
Whereas the DLL is (finally) produced by the linker (from object files).
So the question here is really one of toolchain support, since the DLL is already a linker output, I doubt you could get the linker to re-link its PE code directly into the executable.
If you want to link the .dll at build time instead of run time yes, it can be done using the .lib file that corresponds to the .dll. The exact method depends on what you are using to build your application.
In Visual Studio you start by adding the .lib file in Linker->Input on the project properties.
While this is static linking it does not copy the .dll code into your executable; you still need the .dll to run the application.
Additionally, if the .dll is something you developed and/or have the source code, it can be modified/rebuild as a static library and linked into your executable (so you will not have a separate .dll file).