Are static libraries version independent? - c

Say you're using C11, and you need to use a library (either static or dynamic) written in C17. Can you compile the library to object files, and then just link those with the ones of your program? I mean, object files are just executable files (which are in machine-code, or binary?), except you need to link them before they're executable. Anything crazy in what I just wrote?
By the way, is an object file without any dependencies executable?

For one routine to be able to call another, they need to pass and receive arguments in a compatible way. Computing platforms typically have an application binary interface (ABI) that says how arguments are passed. Routines written in C, C++, FORTRAN, PL/I, or other languages can call each other as long as they use the same ABI. Routines compiled with different C standards can call each other as long as they use the same ABI.
There are compatibility issues other than passing arguments. One version of a library might have some feature that requires specifying a length. The next version might require specifying the length and a width, or it might have the length required and the width optional. A program written for one version of the library might not be able to use another version because it is not passing the arguments the library requires, even though the way it is passing the arguments conforms to the ABI.
If you have the source code for a library and for your own program and you compile them both using the same compiler, just with a C 2017/2018 switch for one and C 2011 for the other, they can work together if you call the library routines correctly.
Object files are generally not executable because they are in a different format than executable files. There are different object formats, and, as far as what is theoretically possible if not actually practiced, somebody could design an object file format that is executable if it contains no dependencies or could design a program loader that reads an object file format and loads it for execution, again if it has no dependencies. In that regard, though, you could execute a source file by making a program loader compiled, linked, and loaded it.

Related

Are C libraries distributed with textual header files along with the binary files?

I'm learning about static and dynamic libraries in C and how to make them.
One thing that keeps bothering me is this:
Suppose a file is using the library mylibrary by doing #include <mylibrary.h>.
Does this mean that C libraries are distributed along with matching textual header files? Or is mylibrary.h somehow magically exported from the binary library file?
Does this vary between different approaches, or whether the library is static or dynamic?
Yes, and depending on the platform, you get even more files to distribute with it. It's a quite messy story. At least, it doesn't matter whether the library is static or dynamic (aside from linker parameters).
The header file is necessary because the compiled binary does not contain enough information to be usable by the compiler. With some platform-based variance, a C binary typically only has enough metadata to identify functions and global variables by their name. That metadata does not include the types (or count) of parameters, return types, structure or union definitions, the type or size of global variables, etc. All of this information typically is encoded in the headers that are distributed with the library. (Conveniently, it also means that anything that does not exist in the header is hidden from the developer; this is what allows you to create non-public functions in a library, that users shouldn't call directly.)
On some platforms, binaries don't even contain function names. Instead, functions are referenced by their position in an "ordinal table". On those platforms, the library has to ship a header, the executable binary, and an additional file that translates from the name of the function in the header to the index of the function in the ordinal table, such that "void hello(void)" might be "function at index 3 in ordinal table" to the linker.
Conversely, including a header does not (usually) link against the library that it accompanies. This is possible on some platform, like Windows, on which there are special compiler directives that you can put in a header and that tells the linker to link against some library, but it is not standard behavior and you can't expect it to be a reality on any other platform.
Up and coming are modules, which provide a better user experience to link against binaries. A module is yet another file that you can package with your binary and that says "here are all my headers and here are all my libraries". Using modules, it's possible to write something like "import MyLibrary;" and it'll get you all the headers and all the linker arguments that you need. I believe that there are no C-standard modules yet; C++ is getting there with C++20.

Linking object files from different C compilers

Say I have two compilers, or even a single compiler with two different option sets. Each compiler compiles some C code into an object and I try to link the two .o files with a common linker. Will this succeed?
My initial thought is: not always. If the compilers are using the same object file format and have compatible options, then it would succeed. But, if the compilers have conflicting options, or (and this is an easy one) are using two different object file formats, it would not work correctly.
Does anyone have more insight on this? What standards would the object files need to comply with to gain confidence that this will work?
Most flavors of *nix OSes have well defined and open ABI and mostly use ELF object file format, so it is not a problem at all for *nix.
Windows is less strictly defined and different compilers may vary in some calling conventions (for example __fastcall may not be supported by some compilers or may have different behavior, see https://en.wikipedia.org/wiki/X86_calling_conventions). But main set of calling conventions (__stdcall, _cdecl, etc) is standard enough to ensure successfull call of function compiled by one compiler from another compiler, otherwise the program won't work at all, since unlike Linux every system call in Windows is wrapped by function from DLL which you need to successfully call.
The other problem is that there is no standard common format for object files. Although most tools (MS, Intel, GCC (MinGW), Clang) use COFF format, some may use OMF (Watcom) or ELF (TinyC).
Another problem is so called "name mangling". Although it was introduced to support overloading C++ functions with the same name, it was adopted by C compilers to prevent linkage of functions defined with different calling conventions. For example, function int _cdecl fun(void); will get compiled name _fun whilst int __stdcall fun(void); will get name _fun#0. More information on name mangling see here: https://en.wikipedia.org/wiki/Name_mangling.
At last, default behavior may differ for some compilers, so yes, options may prevent successful linking of object files produced by different compilers or even by the same compiler. For example, TinyC use default convention _cdecl, whilst CLang use __stdcall. TinyC with default options may not produce code that may be linked with other because it doesn't prepend name by underscore sign. To make it cross-linkable it needs -fleading-underscore option.
But keeping in mind all said above the code may successfully be intermixed. For example, I successfully linked together code produced by Visual Studio, Intel Parallel Studio, GCC (MinGW), Clang, TinyC, NASM.

What is the difference between include and link when linking to a library?

What does include and link REALLY do? What are the differences? And why do I need to specify both of them?
When I write #include math.h and then write -lm to compile it, what does #include math.h and -lm do respectively?
In my understanding, when linking a library, you need its .h file and its .o file. Does this suggest #include math.h means take in the .h file while -lm take in the .o file?
The reason that you need both a header (the interface description) and the library (the implementation) is that C separates the two clearer than languages like C# or Java do. One can compile a C function (e.g. by invoking gcc -c <sourcefile>) which calls library code even when the called library is not present; the header, which contains the interface description, suffices. (This is not possible with C# or Java; the assemblies resp. class files/jars must be present.) During the link stage though the library must be there, even when it's dynamic, afaik.
With C#, Java, or script languages, by contrast, the implementation contains all information necessary to define the interface. The compiler (which is not as clearly separated from the linker) looks in the jar file or the C# assembly which contain called implementations and obtains information about function signatures and types from there.
Theoretically, that information could probably be present in a library written in C as well — it's basically the debug information. But the classic C compiler (as opposed to the linker) is oblivious to libraries or object files and cannot parse them. (One should remember that the "compiler" executable you usually use to compile a C program , e.g. gcc, is a "compiler driver" which interprets the command line arguments and calls the programs which actually do stuff, e.g. the preprocessor, actual compiler and actual linker, to create the desired output.)
So in theory, if you have a properly annotated library in a known location, you could probably write a compiler which compiles a C function against it without having function declarations and type definitions; the compiler would have to produce the proper declarations. The compiler would have to know which library to parse (which corresponds to setting a C# project "Reference" in VS or having a class path and name/class correspondence in Java).
It would probably be easiest to use a well-known debugging format like stabs or dwarf and extract the interface definitions from it with a little helper program which uses the API for the debug format, extracts the information and produces a C header which is prepended to every source file. That would be the job of the compiler driver, and the actual compiler would still be oblivious to that.
It's because headers files contain only declaration and .o files (or .obj, .dll or .lib) contain definitions of methods.
If you open an .h file, you will not see the code of methods, because that is in the libraries.
One reason is commercial, because you need to publish your code and have the source code in your company. Libraries are compiled, so you could publish it.
Header files only tell compiler, what classes and methods it can find in the library.
The header files are kind of a table-of-contents plus a kind of dictionary for the compiler. It tells the compiler what the library offers and gives special values readable names.
The library file itself contains the contents.
What you are asking are entirely two different things.
Don't worry , i will explain them to you.
You use # symbol to instruct the preprocessor to include the math.h header files which internally contain the function prototypes of fabs(),ceil() etc..
And you use -lm to instruct the linker, to include the pre-compiled function definitions of fabs(),ceil() etc. functions in the exe file .
Now, you may ask why we have to explicitly link library file of math functions unlike for other functions and the answer is ,it is due to some undefined historical reasons.

Checking type of variables in dynamically loaded shared libraries in C/C++

I'm working on a test environment of a C library. The library extensively use global variables, what I want to check in the test codes. Unfortunately I have to load the library dynamically (using libdl) to be able to reset the function static variables. This way I have to load every global using dlsym() and I have to cast them one by one manually to the correct type. Is there any way to automatize that and get the type info of the variables somehow?
As far as I see libdl has no such feature. I wondered that I might be able to link to gdb, using it to access to shared library globals, but I didn't managed to find any clue about that possibility either.
No, there is no way to get the type of some dlsym-ed symbol, because an ELF shared object don't (always) carry any type information (except for C++, using name mangling).
And in principle, an ELF shared object might be produced without any C compiler, so the very notion of type of a given symbol might not exist, or the type be incompatible with C conventions.
However, you could restrict yourself to shared libraries with debug information. The DWARF format does carry type (and even source location) information about symbols. You might parse it with e.g. libdwarf or some other library.
You may consider alternative ways: for instance, you could have your own GCC plugin or MELT extension (MELT is a domain specific language to extend GCC) which would be used when compiling (with GCC) the shared libraries and would register the type information somewhere.

Some general C questions

I am trying to fully understand the process pro writing code in some language to execution by OS. In my case, the language would be C and the OS would be Windows. So far, I read many different articles, but I am not sure, whether I understand the process right, and I would like to ask you if you know some good articles on some subjects I couldn´t find.
So, what I think I know about C (and basically other languages):
C compiler itself handles only data types, basic math operations, pointers operations, and work with functions. By work with functions I mean how to pass argument to it, and how to get output from function. During compilation, function call is replaced by passing arguments to stack, and than if function is not inline, its call is replaced by some symbol for linker. Linker than find the function definition, and replace the symbol to jump adress to that function (and of course than jump back to program).
If the above is generally true and I get it right, where to final .exe file actually linker saves the functions? After the main() function? And what creates the .exe header? Compiler or Linker?
Now, additional capabilities of C, today known as C standart library is set of functions and the declarations of them, that other programmers wrote to extend and simplify use of C language. But these functions like printf() were (or could be?) written in different language, or assembler. And there comes my next question, can be, for example printf() function be written in pure C without use of assembler?
I know this is quite big question, but I just mostly want to know, wheather I am right or not. And trust me, I read a lots of articles on the web, and I would not ask you, If I could find these infromation together on one place, in one article. Insted I must piece by piece gather informations, so I am not sure if I am right. Thanks.
I think that you're exposed to some information that is less relevant as a beginning C programmer and that might be confusing you - part of the goal of using a higher level language like this is to not have to initially think about how this process works. Over time, however, it is important to understand the process. I think you generally have the right understanding of it.
The C compiler merely takes C code and generates object files that contain machine language. Most of the object file is taken by the content of the functions. A simple function call in C, for example, would be represented in the compiled form as low level operators to push things into the stack, change the instruction pointer, etc.
The C library and any other libraries you would use are already available in this compiled form.
The linker is the thing that combines all the relevant object files, resolves all the dependencies (e.g., one object file calling a function in the standard library), and then creates the executable.
As for the language libraries are written in: Think of every function as a black box. As long as the black box has a standard interface (the C calling convention; that is, it takes arguments in a certain way, returns values in a certain way, etc.), how it is written internally doesn't matter. Most typically, the functions would be written in C or directly in assembly. By the time they make it into an object file (or as a compiled library), it doesn't really matter how they were initially created, what matters is that they are now in the compiled machine form.
The format of an executable depends on the operating system, but much of the body of the executable in windows is very similar to that of the object files. Imagine as if someone merged together all the object files and then added some glue. The glue does loading related stuff and then invokes the main(). When I was a kid, for example, people got a kick out of "changing the glue" to add another function before the main() that would display a splash screen with their name.
One thing to note, though is that regardless of the language you use, eventually you have to make use of operating system services. For example, to display stuff on the screen, to manage processes, etc. Most operating systems have an API that is also callable in a similar way, but its contents are not included in your EXE. For example, when you run your browser, it is an executable, but at some point there is a call to the Windows API to create a window or to load a font. If this was part of your EXE, your EXE would be huge. So even in your executable, there are "missing references". Usually, these are addressed at load time or run time, depending on the operating system.
I am a new user and this system does not allow me to post more than one link. To get around that restriction, I have posted some idea at my blog http://zhinkaas.blogspot.com/2010/04/how-does-c-program-work.html. It took me some time to get all links, but in totality, those should get you started.
The compiler is responsible for translating all your functions written in C into assembly, which it saves in the object file (DLL or EXE, for example). So, if you write a .c file that has a main function and a few other function, the compiler will translate all of those into assembly and save them together in the EXE file. Then, when you run the file, the loader (which is part of the OS) knows to start running the main function first. Otherwise, the main function is just like any other function for the compiler.
The linker is responsible for resolving any references between functions and variables in one object file with the references in other files. For example, if you call printf(), since you do not define the function printf() yourself, the linker is responsible for making sure that the call to printf() goes to the right system library where printf() is defined. This is done at compile-time.
printf() is indeed be written in pure C. What it does is call a system call in the OS which knows how to actually send characters to the standard output (like a window terminal). When you call printf() in your program, at compile time, the linker is responsible for linking your call to the printf() function in the standard C libraries. When the function is passed at run-time, printf() formats the arguments properly and then calls the appropriate OS system call to actually display the characters.

Resources