What is the difference between a .o file and a .lib file? - c

What is the difference between a .o file and a .lib file?

Conceptually, a compilation unit (the unit of code in a source file/object file) is either linked entirely or not at all. While some implementations, with significant levels of cooperation between the compiler and linker, are able to remove unused code from object files at link time, it doesn't change the issue that including 2 compilation units with conflicting symbol names in a program is an error.
As a practical example, suppose your library has two functions foo and bar and they're in an object file together. If I want to use bar, but my program already has an external symbol named foo, I'm stuck with an error. Even if or how the implementation might be able to resolve this problem for me, the code is still incorrect.
On the other hand, if I have a library file containing two separate object files, one with foo and the other with bar, only the one containing bar will get pulled into my program.
When writing libraries, you should avoid including multiple functions in the same object file unless it's essential that they be used together. Doing so will bloat up applications which link your library (statically) and increase the likelihood of symbol conflicts. Personally I prefer erring on the side of separate files when there's a doubt - it's even useful to put foo_create and foo_free in separate files if the latter is nontrivial so that short one-off programs that don't need to call foo_free can avoid pulling in the code for deep freeing (and possibly even avoid pulling in the implementation of free itself).

A .LIB file is a collection of .OBJ
files concatenated together with an
index. There should be no difference
in how the linker treats either.
Quoted from here:
What is the difference between .LIB and .OBJ files? (Visual Studio C++)

They are actually quite different, specially with older linkers.
The .o (or .obj) files are object files, they contain the output of the compiler generated code. It is still in an intermediate format, for example, most references are still unresolved. Usually there is a one to one mapping between the source file and the object file.
The .a (or .lib) files are archives, also known as library, and are a set of object files.
All operating systems have tools that allow you to add/remove/list object files to library files.
Another difference, specially with older linkers is how the files are dealt with, when linking them. Some linked will place the complete object file into the final binary, regardless of what is actually being used, while they will only extract the useful information out of library files.
Nowadays most linkers are smart enough to remove all stuff that is not being used.

Related

Except linking object files of current code and other included header files, what other task linker do?

#include <stdio.h>
#include<a.h> // other file which have function "fun()" declaration
int main()
{
int a =100;
printf("print fun = %d", fun());
return 0;
}
Except linking object files of current code and other included header files, what other task linker do?
see below dummy code:
This answer applies to linked programs (e.g., C, C++) , as opposed to interpreted programs (e.g., shell scripts, most byte-code languages, interpreters). The shades of grey for byte-code languages are ignored here.
Linkers start by collecting the object modules (called .o files herein) and libraries the command line points it to, building a list of all provided and referenced global names therein. By this time the actual source code has been left behind. Ignoring debugging information, described later, effectively only the names of global variables and functions remain in .o files along with associated values.
The object code tracks what parts of .o files are variables and what parts are executable code along with the names and locations of external entries. Some languages (C++) track argument signatures though others (C) do not.
Keeping it simple, after including .o files on the command line in the build, the linker's major job is to look through those .o files to extract references to all externals and then hunt through the libraries to satisfy those externals. The whole lot is bundled up into your executable or, in the case of dynamic libraries loaded during execution, appropriate links to loadable modules are put in the executable.
The linking process assigns everything linkers put into executables a memory addresses to reside at. The linker puts all this into the executable along with additional metadata, such as file headings and optional debugging information, in a way loaders can nicely extract when the program runs. This is usually divided into program segments for executable code, data segments when the program's variables that have initial values associated with them live, and places for variables that have no specific values live that get the default value of binary zeros. These segments provide the run time layout of the executable.
A few basic things, like the names and locations of functions, are often kept in all builds, but this can be so minimal to be worthless in debugging most failures, which leads developers into using -g during development in most POSIX/Linus type compilers.
Debugging information for .o modules compiled under -g (or whatever) will be discarded if the linker is not told to keep it in the build. This information includes all of the external variable names, functions, their addresses, and more; all that stuff you can see with debuggers is included here, and adds considerable to the size of executables. This often includes associating locations of functions, and code therein, to the original source files.
Linkers also identify where the execution is to start at. A small but critical thing and is not the "main()" type functions that beginners tend to believe is the start of their code's run but some place deep with the language library that is automagically included into the build in perhaps obscure ways. This startup code assures all the stuff needed for successful execution happens, like getting ready to handle malloc(), use inherited open files, setting up environment variables, setting up the stack and heap, and the like. Once all this is done main()'s code is used to kick off the user's code. While the linker has nothing to do with this initialization code, nor the final completion code that runs when the program exits or the main returns, the linker does point assure this start up code is present, which also includes properly exit when user code is done.
Once built the linker is done, with the big exclusion of possibility handling dynamic modules that get loaded when called. How such are handled are heavily dependent on the OS and the nature of the link. Sometimes a linker can be involved and sometimes it is "just" loading a module file.
In my view the "loader" that reads in executables into the proper memory, zeros out some chunks of memory, and other start up handling, as a key partner with the linker. While a key partner it is definitely a separate program.
Basically, the linker links object files (the result of compiling your various source code files (not headers), the standard library, the math library, the ncurses library, the big integer library, ...) and adds management data to make a recognizable executable (ELF format, EXE format, PE format, ...).

What is the point of linking 2 source files instead of using a header file?

Why should I use another source code file to share code or a function between many programs and use the linker instead of using a header file only? (I read this in Head First C but I didn't understand what is the point of it)
Generally, header files should only be used to declare your functions/structs/classes.
The actual implementation should be created in a separate .c file which then can be built and exported as a binary along with the header.
Keeping the implementation in the header has many drawbacks.
Bigger footprint - the header size will be bigger since you have
more symbols in it.
You cannot hide the implementation from the end-user.
The compile-time will be a lot larger since all code has to be processed every time it is included by the compiler.
Just to name a few. They might be many more reasons.
However, there are some cases when it is okay/better to include some logic in the header files.
For example for inline functions which may improve the runtime of the application while maintaining good code quality and/or templates in C++.
Generally, header files contain declarations, like function signatures, while the function definitions (the actual source code) are located in separate source files.
Multiple files can include the same header file and share function declarations at compile time. But they also must share the source files (the files must be linked together) in order to have access to the function code at run time.

Included files, all or nothing?

If I #include a file in C, do I get the entire contents of the file linked in, or just the parts I use?
If it has 10 functions in it, and I only use one of the functions, does the code for the other nine functions get included in my executable? This is especially relevant for me right now as I am working on a microcontroller and memory is precious.
Firstly, header files do not get "linked in". #include is basically a textual copy-paste feature. Everything from your include file gets pasted by preprocessor into the final translation unit, which will later be seamlessly processed by the compiler proper. The compiler proper knows nothing about any header files or #include directives.
Secondly, it means that if in your code you declared or defined some function or variable that you do not use, it is completely irrelevant whether it came from a header file through #include or was written directly in source file. There's absolutely no difference.
Thirdly, the question is: what exactly do you have in your header file that you include? Typically, header files do not define objects and functions, they simply declare them. Declarations do not produce any code, regardless whether you use the function or not. Declarations simply tell the compiler that the code (generated from the function definition) already exists elsewhere. Thus, as long as we are talking about typical header files, #include directives and header files by themselves have no effect on final code size.
Fourthly, if your header file is of some unusual kind that contains function (or object) definitions, then see "firstly" and "secondly" above. The compiler proper can see only one translation unit at a time, for which reason a typical strategy for the compiler proper is to completely discard unused entities with internal linkage (i.e. static objects and functions) and keep all entities with external linkage. Entities with external linkage cannot be discarded by compiler proper, since they might be needed in some other translation unit.
Fifthly, at linking stage linker can see the program in its entirety and, for that reason, can discard unused objects and functions, if it is advanced enough for that (and if you allow linker to do it). Meanwhile, inclusion-exclusion precision of a typical run-of-the-mill linker is limited to a single object file. Each object file is atomic to such linker. This means that if you want to be able to exclude unused functions on per-function basis, you might have to adopt "one function per object file" strategy, i.e. write one and only one function per .c file. Of course, this is only possible when you write your own code. If some third-party library you want to use does not adhere to this convention, then you might not be able to exclude individual functions.
If you #include a file in C, the entire contents of that file are added to your source file and compiled by your compiler. A header file, though, usually only has declarations of functions and no definitions (so no actual code is compiled).
The linker, on the other hand, takes all the functions from all the libraries and compiled source code and merges them into the final output file. At this time, the linker will discard any functions that you aren't using.
So, to answer your question: only the functions you use (and indirectly depend on) will be included in your final program file, and this is independent of what files you #include. Happy hacking!
You have to distinguish between different scenarios:
What does the included header file contain? Declarations of external functions only, or also static function definitions?
How are the implementations of the external functions distributed which are declared in that the header file you include? Are they all implemented in one .c file, or distributed across several .c files?
Regarding point 1: Only by #includeing external declarations, no other code will become part of your object file. And, definitions of static functions that are part of the header file, but which are not referenced by your code, may not become part of your object file - this is an optimization that is fairly common. It depends on your compiler, however.
Regarding point 2: Some linkers can only link whole object files, all or nothing. That means, if all the external functions declared in a header file are implemented in one .c file, and, if your code references at least one of these functions, chances are that you will get the whole object file, including all the other functions you don't use. Some linkers, however, can avoid this and remove unused parts when linking object files.
One brute-force approach to deal with non-optimizing linkers is, to put every external function into a .c file of its own. You will, however, have to find a way to deal with the situation that some of these functions refer to a common static function that is part of the original .c file...
Include simply presents the compiler ultimately with what looks like a single file (and if you do save-temps on GCC you will see that exactly a single file is presented to the actual compiler). It is no more complicated than that. So if you have some function prototypes or defines in your .c file then having them come from an include makes no difference whatsoever; the end result is the same.
If the things you include include code, functions, and not just prototypes, then it is the same as if you had those in the .c file itself. Whether or not those show up in the final binary has to do with whether or not you declared them as global or not using static, and then whether or not you optimized, etc. The same goes for variables and structures and other things.
Not all linkers are the same, but a common way to do it is whatever the compiler left in the object goes into the final binary. But if you take those objects and make a library out of them then some/many(?) linkers don’t suck everything into the binary on the portions that are required to resolve the dependencies.

Why use object files in C?

When I compile a C program, for ease I've been including the source file for a certain header at the end. So, if main.c includes util.h, util.h will have all the headers util.c will use, outlines types or structs, etc, then at the very end it include util.c. Then, when I compile I only have to use gcc main.c -o main, and the rest is all taken care of.
I've been looking up C coding standards, trying to figure out what the best way to do things is, and there are just so many, and so many conflicting opinions I don't know what to think. Why do so many places reccomend compiling object files individually instead of including all of them in a web? util never touches anything but util.c, so the two are perfectly independent, and in theory (my theory) it would be fine, but I'm probably wrong since this is computer science and people are wrong even when they're right, so if I'm already wrong I'm probably wrong.
Some people say header files should ONLY be prototypes, and the source file be the one that includes it, and it's necessary system headers. From purely as aesthetic point of view I much prefer having all the info (types, system headers used, prototypes) in the header (in this case util.h) and having ONLY function code in util.c (excluding one "#include "util.h"" at the very top).
I guess the point I'm getting at is, with all this stuff that works, selecting a method sounds arbitrary to someone who doesn't understand the background (me). Please tell me why and what.
While your program is small, this will work. At some point, however, your program will get large enough that recompiling the whole program every time you change one line is a pain in the rear.
This -- even more than avoiding editing huge files -- is the reason to split up your program. If main.c and util.c are seperately compiled into object files, changing one line in a function in main.c will no longer require you to recompile all the code in util.c.
By the time your program is made up of a few dozen files, this will be a big win.
I think the point is that you want to include only what is needed for that file to be independent. This reduces overall compilation times by allowing the compiler to only read the headers that are necessary rather repeatedly reading every header when it might not need to. For example, if your util.c method utilises functions and/or types in <stdio.h> but your util.h doesn't, then you would want to include <stdio.h> only in util.c so that when the compiler compiles util.c it only then includes <stdio.h>, but if you include <stdio.h> in your util.h instead, then every source file that includes util.h is also including <stdio.h> whether it needs it or not.
This is very negligible for small projects with only a handful of files, but proper header inclusion can affect compilation times for larger projects.
With regards to the question about "object files": when you compile a source file into an object file, you create a shortcut that allows a build system to only recompile the source files that have outdated object files. This is an effective way to significantly reduce compilation times especially for large projects.
First, including a .c file from a .h file is completely bass-ackwards.
The "standard" way of doing it follows a line of thought roughly like this:
You have a library, containing dozens of functions. Keeping everything in one big source file means that anyone using your library would have to link the whole library, even if he uses only a single function of it. (Imagine linking the whole C standard library for a puts( "Hello" ).)
So you split things across multiple source files, which are compiled individually. Whenever you make changes to one of your functions, you have to re-translate only one small source file and update the library archive (or executable) - instead of re-translating the whole thing every time. (This is still an issue, because code sizes have somewhat kept up with CPU improvements. Compiling something like the Boost lib can still take several minutes on not-too-fancy hardware...)
Now you are in a pinch, however. The function is defined inside the .c file, and the corresponding .o file can conveniently be linked (via a .a archive if need be). However, to actually address the function (provided by the .o file) properly from another source file (a.k.a. "translation unit"), your compiler needs to know the function name, its parameter list, and its return type. This is why the declaration of the function (i.e., the function head without its body) is put in a separate header (.h) file.
Other source files can now #include the header file, address the function properly (without the compiler being aware of what the function actually does), and when all parts of your library / program are compiled into .o files, then everything is linked together.
The source file includes its own header basically to make sure the two files agree on the function declaration. ;-)
That's about it, as far as I can be bothered to write it up right now. Putting everything into one monolithic source file is barely acceptable (actually, no, it isn't, not for anything beyond about 200 lines), but including the .c file at the end of the .h file either means you learned your C coding by looking at god-awful code instead of a good book, or whoever tutored you should never tutor another person on C coding in his life. No offense intended. ;-)
PS: Header files also provide a good summary / oversight of a piece of code. Languages that don't provide headers - Java, for example - need IDE's or documentation tools to extract this kind of information. Personally, I found header files to be a benefit, not a liability.
Please use *.h and *.c files as customary: *.h files are #included in *.c files; *.h contain only macro definitions, data type declarations, function declarations, and extern data declarations. All definitions are in *.c files. That is how everybody else organizes C programs, do your fellow humans (who some day might need to understand your program) a favor. If something in file.c is used outside, you'd write file.h containing the declarations of whatever in that file is to be used outside, and include that in file.c (to check that declarations and definitions agree) and in all using *.c files. If a bunch of *.h are always included together, it might mean that the splitup into *.c isn't right (or at least that of the *.h; perhaps you should make one .h including all those declarations, and creating *.h for internal use where needed among the group of related *.c files).
[If a program written as you outline crosses my path, I can assure you I'll avoid it like the plague. The extra obfuscation might be wellcome in IOCCC, but not by me. It is a sure sign of somebody who doesn't know how to organize a program cleanly, and so the program probably isn't worth trying it out.]
Re: Separate compilation: You break up a C program so the pieces are easier to understand, you can hide details of how things work in the C files (think static), this provides support for Parnas' modularity. It also means that if you change a file, you don't have to recompile everything.
Re: Differing C programming standards: Yes, there are lots of them around. Pick one you feel confortable with, and stick to that. If you work on a project, adhere to their standards.
The "include in a single translation unit" approach becomes very inefficient for any significantly sized project, it is impractical for projects that are distributed amongst multiple developers.
Morover when creating static libraries, if everything in the library were from a single translation unit, any code linked to it would get all the library code regardless of whether it is referenced or not.
A project using a build manager such as make or the features available in most IDEs uses header file dependencies to allow an incremental build; only compiling those sources that are modified or dependent on modified files. The dependencies are determined by the file inclusions, so minimising redundant dependencies speeds build time.
A typical commercial project can comprise hundreds of thousands of lines of code and a few hundred source files; full rebuild times can vary from minutes to hours. If in your development cycle you have to wait that long between code changes and test, productivity would be very low!

Determining the space used by library functions in C

I have a bunch of code I need to analyse that I don't know how to do. I have a pile of code that here and there are using math functions from a header file I have included called math.h that came with my IDE. I am being asked to see how much space is used to include this. Specifically is the compiler including all of the library functions or just the ones we use. There is no object file being created. So I think the library code is being compiled into the individual files. Any ideas of a slick way to figure this out? I can't just comment out the includes because then the code wont complie and I won't know a size and if I comment out all the lines that use math functions it is not really representative.
You can use the objdump command to see the individual symbols inside your object files and the space they require.
Note that unless you're doing static compilation, library methods aren't generally copied into your produced binary, but only referenced (and brought in via the dynamic linker when your program is loaded).
As math.h is part of the standard C library, a copy of that library is basically guaranteed to always be in memory, so the additional memory and space requirements on dynamic linking are minimal. (During static linking, all symbols which aren't directly required by your program are discarded, and math functions don't tend to be very big, so usage should be fairly minimal there too).
The code in the header file is being complied into the object file of the .c you are using if your header has the definitions of the functions and just being referenced to if they are simply the declarations. The linker will then find a definition for each symbol and place it in your executable if you are using dynamic linking the OS will pull in the definition at run time.

Resources