When and how is the code for library functions added in c? - c

When we include the header files in C , we actually add the declaration of the functions such as the printf , scanf etc. But how does the code for the function ( the function declaration ) get added to the program ?

That's done by the process of linking. Individually compiled translation units have a way of referring to dependent names symbolically, so your code would only say "call a function with name 'printf'", and it is the job of the linking procedure to look up those symbols in one of the provided object or library files.
The standard library is usually linked against your code implicitly, so you may not be aware of the fact that you are linking your code with pre-existing library code. You would definitely be aware of this if you used your own libraries.
Note that there is no standard for linking, so you cannot generally compile one file with one compiler and another file with a different compiler and then link them together. The problem is not just to agree on how names are represented, but also on how to generate code for function calls. There are however several "informal" calling conventions and name mangling rules on popular platforms that offer a degree of interoperability.

Related

Where are declaration and definition stored?

For a predefined function where are the declaration and definition stored?
And is the declaration stored in libraries? If so, then why it is named library function?
This is an imprecise question. The best answers we can give are:
The declaration of standard library functions can best be thought of as being stored in their header files.
Depending on how you want to think about it, the definition of standard library files is either in the source files for those libraries (which may be invisible to you, a trade secret of your compiler vendor), or in the library files themselves (.a, .so, .lib, or .dll).
These days, knowledge of standard library functions is typically built in to the compiler, also. For example, if I write the old classic int main() { printf("Hello, world!\n"); }, but without any #include directives, my compiler says, "warning: implicitly declaring library function 'printf'" and "include the header <stdio.h>".
There are two sides of this story:
The code that calls a library/external function: The compiler generates a reference in your compiled module that encodes which function prototype you expect to exist elsewhere.
The pre-compiled library files against which your code must be linked: A library file contains both the coded prototypes of its functions as well as the actual compiled binary (i.e. the definition/implementation) for these function.
When your code uses an external function, the compiler generates a reference to this function that it assumes will be resolved later during the linking phase.
During the linking process lists of function references are build up. The linker expects to find the 'definition'/implementation of each of the used references.
The header file contains the declaration of built-in functions and the library contains the definition of the functions.
The name library is because, in my opinion, as the actual library which contains books, these libraries contain the classes, functions, variables etc.

What is the difference between include and link when linking to a library?

What does include and link REALLY do? What are the differences? And why do I need to specify both of them?
When I write #include math.h and then write -lm to compile it, what does #include math.h and -lm do respectively?
In my understanding, when linking a library, you need its .h file and its .o file. Does this suggest #include math.h means take in the .h file while -lm take in the .o file?
The reason that you need both a header (the interface description) and the library (the implementation) is that C separates the two clearer than languages like C# or Java do. One can compile a C function (e.g. by invoking gcc -c <sourcefile>) which calls library code even when the called library is not present; the header, which contains the interface description, suffices. (This is not possible with C# or Java; the assemblies resp. class files/jars must be present.) During the link stage though the library must be there, even when it's dynamic, afaik.
With C#, Java, or script languages, by contrast, the implementation contains all information necessary to define the interface. The compiler (which is not as clearly separated from the linker) looks in the jar file or the C# assembly which contain called implementations and obtains information about function signatures and types from there.
Theoretically, that information could probably be present in a library written in C as well — it's basically the debug information. But the classic C compiler (as opposed to the linker) is oblivious to libraries or object files and cannot parse them. (One should remember that the "compiler" executable you usually use to compile a C program , e.g. gcc, is a "compiler driver" which interprets the command line arguments and calls the programs which actually do stuff, e.g. the preprocessor, actual compiler and actual linker, to create the desired output.)
So in theory, if you have a properly annotated library in a known location, you could probably write a compiler which compiles a C function against it without having function declarations and type definitions; the compiler would have to produce the proper declarations. The compiler would have to know which library to parse (which corresponds to setting a C# project "Reference" in VS or having a class path and name/class correspondence in Java).
It would probably be easiest to use a well-known debugging format like stabs or dwarf and extract the interface definitions from it with a little helper program which uses the API for the debug format, extracts the information and produces a C header which is prepended to every source file. That would be the job of the compiler driver, and the actual compiler would still be oblivious to that.
It's because headers files contain only declaration and .o files (or .obj, .dll or .lib) contain definitions of methods.
If you open an .h file, you will not see the code of methods, because that is in the libraries.
One reason is commercial, because you need to publish your code and have the source code in your company. Libraries are compiled, so you could publish it.
Header files only tell compiler, what classes and methods it can find in the library.
The header files are kind of a table-of-contents plus a kind of dictionary for the compiler. It tells the compiler what the library offers and gives special values readable names.
The library file itself contains the contents.
What you are asking are entirely two different things.
Don't worry , i will explain them to you.
You use # symbol to instruct the preprocessor to include the math.h header files which internally contain the function prototypes of fabs(),ceil() etc..
And you use -lm to instruct the linker, to include the pre-compiled function definitions of fabs(),ceil() etc. functions in the exe file .
Now, you may ask why we have to explicitly link library file of math functions unlike for other functions and the answer is ,it is due to some undefined historical reasons.

Determining the space used by library functions in C

I have a bunch of code I need to analyse that I don't know how to do. I have a pile of code that here and there are using math functions from a header file I have included called math.h that came with my IDE. I am being asked to see how much space is used to include this. Specifically is the compiler including all of the library functions or just the ones we use. There is no object file being created. So I think the library code is being compiled into the individual files. Any ideas of a slick way to figure this out? I can't just comment out the includes because then the code wont complie and I won't know a size and if I comment out all the lines that use math functions it is not really representative.
You can use the objdump command to see the individual symbols inside your object files and the space they require.
Note that unless you're doing static compilation, library methods aren't generally copied into your produced binary, but only referenced (and brought in via the dynamic linker when your program is loaded).
As math.h is part of the standard C library, a copy of that library is basically guaranteed to always be in memory, so the additional memory and space requirements on dynamic linking are minimal. (During static linking, all symbols which aren't directly required by your program are discarded, and math functions don't tend to be very big, so usage should be fairly minimal there too).
The code in the header file is being complied into the object file of the .c you are using if your header has the definitions of the functions and just being referenced to if they are simply the declarations. The linker will then find a definition for each symbol and place it in your executable if you are using dynamic linking the OS will pull in the definition at run time.

How to catch unintentional function interpositioning?

Reading through my book Expert C Programming, I came across the chapter on function interpositioning and how it can lead to some serious hard to find bugs if done unintentionally.
The example given in the book is the following:
my_source.c
mktemp() { ... }
main() {
mktemp();
getwd();
}
libc
mktemp(){ ... }
getwd(){ ...; mktemp(); ... }
According to the book, what happens in main() is that mktemp() (a standard C library function) is interposed by the implementation in my_source.c. Although having main() call my implementation of mktemp() is intended behavior, having getwd() (another C library function) also call my implementation of mktemp() is not.
Apparently, this example was a real life bug that existed in SunOS 4.0.3's version of lpr. The book goes on to explain the fix was to add the keyword static to the definition of mktemp() in my_source.c; although changing the name altogether should have fixed this problem as well.
This chapter leaves me with some unresolved questions that I hope you guys could answer:
Does GCC have a way to warn about function interposition? We certainly don't ever intend on this happening and I'd like to know about it if it does.
Should our software group adopt the practice of putting the keyword static in front of all functions that we don't want to be exposed?
Can interposition happen with functions introduced by static libraries?
Thanks for the help.
EDIT
I should note that my question is not just aimed at interposing over standard C library functions, but also functions contained in other libraries, perhaps 3rd party, perhaps ones created in-house. Essentially, I want to catch any instance of interpositioning regardless of where the interposed function resides.
This is really a linker issue.
When you compile a bunch of C source files the compiler will create an object file for each one. Each .o file will contain a list of the public functions in this module, plus a list of functions that are called by code in the module, but are not actually defined there i.e. functions that this module is expecting some library to provide.
When you link a bunch of .o files together to make an executable the linker must resolve all of these missing references. This is the point where interposing can happen. If there are unresolved references to a function called "mktemp" and several libraries provide a public function with that name, which version should it use? There's no easy answer to this and yes odd things can happen if the wrong one is chosen
So yes, it's a good idea in C to "static" everything unless you really do need to use it from other source files. In fact in many other languages this is the default behavior and you have to mark things "public" if you want them accessible from outside.
It sounds like what you want is for the tools to detect that there are name conflicts in functions - ie., you don't want your externally accessible function names form accidentally having the same name and therefore 'override' or hide functions with the same name in a library.
There was a recent SO question related to this problem: Linking Libraries with Duplicate Class Names using GCC
Using the --whole-archive option on all the libraries you link against may help (but as I mentioned in the answer over there, I really don't know how well this works or how easy it is to convince builds to apply the option to all libraries)
Purely formally, the interpositioning you describe is a straightforward violation of C language definition rules (ODR rule, in C++ parlance). Any decent compiler must either detect these situations, or provide options for detecting them. It is simply illegal to define more than one function with the same name in C language, regardless of where these functions are defined (Standard library, other user library etc.)
I understand that many platforms provide means to customize the [standard] library behavior by defining some standard functions as weak symbols. While this is indeed a useful feature, I believe the compilers must still provide the user with means to enforce the standard diagnostics (on per-function or per-library basis preferably).
So, again, you should not worry about interpositioning if you have no weak symbols in your libraries. If you do (or if you suspect that you do), you have to consult your compiler documentation to find out if it offers you with means to inspect the weak symbol resolution.
In GCC, for example, you can disable the weak symbol functionality by using -fno-weak, but this basically kills everything related to weak symbols, which is not always desirable.
If the function does not need to be accessed outside of the C file it lives in then yes, I would recommend making the function static.
One thing you can do to help catch this is to use an editor that has configurable syntax highlighting. I personally use SciTE, and I have configured it to display all standard library function names in red. That way, it's easy to spot if I am re-using a name I shouldn't be using (nothing is enforced by the compiler, though).
It's relatively easy to write a script that runs nm -o on all your .o files and your libraries and checks to see if an external name is defined both in your program and in a library. Just one of the many sane sensible services that the Unix linker doesn't provide because it's stuck in 1974, looking at one file at a time. (Try putting libraries in the wrong order and see if you get a useful error message!)
The Interposistioning occurs when the linker is trying to link separate modules.
It cannot occur within a module. If there are duplicate symbols in a module the linker will report this as an error.
For *nix linkers, unintended Interposistioning is a problem and it is difficult for the linker to guard against it.
For the purposes of this answer consider the two linking stages:
The linker links translation units into modulles (basically
applications or libraries).
The linker links any remaining unfound symbols by searching in modules.
Consider the scenario described in 'Expert C programming' and in SiegeX's question.
The linker fist tries to build the application module.
It sess that the symbol mktemp() is an external and tries to find a funcion definiton for the symbol. The linker finds
the definition for the function in the object code of the application module and marks the symbol as found.
At this stage the symbol mktemp() is completely resolved. It is not considered in any way tentative so as to allow
for the possibility that the anothere module might define the symbol.
In many ways this makes sense, since the linker should first try and resolve external symbols within the module it is
currently linking. It is only unfound symbols that it searches for when linking in other modules.
Furthermore, since the symbol has been marked as resolved, the linker will use the applications mktemp() in any
other cases where is needs to resolve this symbol.
Thus the applications version of mktemp() will be used by the library.
A simple way to guard agains the problem is to try and make all external sysmbols in your application or library unique.
For modules that are only going to shared on a limited basis, this can fairly easily be done by making sure all
extenal symbols in your module are unique by appending a unique identifier.
For modules that are widely shared making up unique names is a problem.

A Java programmer has questions regarding C header files

I have a fair amount of practice with Java as a programming language, but I am completely new to C. I understand that a header file contains forward declarations for methods and variables. How is this different from an abstract class in Java?
The short answer:
Abstract classes are a concept of object oriented programming. Header files are a necessity due to the way that the C language is constructed. It cannot be compared in any way
The long answer
To understand the header file, and the need for header files, you must understand the concepts of "declaration" and "definition". In C and C++, a declaration means, that you declare that something exists somewhere, for example a function.
void Test(int i);
We have now declared, that somewhere in the program, there exists a function Test, that takes a single int parameter. When you have a definition, you define what it is:
void Test(int i)
{
...
}
Here we have defined what the function void Test(int) actually is.
Global variables are declared using the extern keyword
extern int i;
They are defined without the extern keyword
int i;
When you compile a C program, you compile each source file (.c file) into an .obj file. Definitions will be compiled into the .obj file as actual code. When all these have been compiled, they are linked to the final executable. Therefore, a function should only be defined on one .c file, otherwise, the same function will end up multiple times in the executable. This is not really critical if the function definitions are identical. It is more problematic if a global variable is linked into the same executable twice. That will leave half the code to use the one instance, and the other half of the code to use the other instance.
But functions defined in one .c file cannot see functions defined in another .c files. So if from file1.c file you need to access function Test(int) defined in file2.c, you need to have a declaration of Test(int) present when compiling file1.c. When file1.c is compiled into file1.obj, the resulting .obj file will contain information that it needs Test(int) to be defined somewhere. When the program is linked, the linker will identify that file2.obj contains the function that file1.obj depends on.
If there is no .obj file containing the definition for this function, you will get a linker error, not a compiler error (linker errors are considerably more difficult to find and correct that compiler errors because you get no filename and line number for the resulting file)
So you use the header file to store declarations for the definitions stored in the corresponding source file.
IMO it's mainly because many C programmers seem to think that Java programmers don't know how to program “for real”, e.g. handling pointers, memory and so on.
I would rather compare headers to Java interfaces, in the sense that they generally define how the API must be used.
Headers are basically just a way to avoid copy-pasting: the preprocessor simply includes the content of the header in the source file when encounters an #include directive.
You put in a header every declaration that the user will commonly use.
Here's the answers:
Java has had a bad reputation among some hardcore C programmers mainly because they think:
it's "too easy" (no memory-management, segfaults)
"can't be used for serious work"
"just for the web" or,
"slow".
Java is hardly the easiest language in the world these days, compared to some lanmguages like Python, etc.
It is used in many desktop apps - applets aren't even used that often. Finally, Java will always be slower than C, because it is not compiled directly to machine code. Sometimes, though, extreme speed isn't needed. Anyway, the JVM isn't the slowest language VM ever.
When you're working in C, there aren't abstract classes.
All a header file does is contain code which is pasted into other files. The main reason you put it in a header file is so that it is at the top of the file - this way, you don't need to care where you put your functions in the actual implementation file.
While you can kind-of use OO concepts in C, it doesn't have built-in support for classes and similar fundamentals of OO. It is nigh-impossible to implement inheritance in plain C, therefore there can never actually have OO, or abstract classes for that matter. I would suggest sticking to plain old structs.
If it makes it easier for you to learn, by all means think of them as abstract classes (with the implementation file being the inheriting class) - but IMHO it is a difficult mindset to use when for working in a language without explicit support of said features.
I'm not sure if Java has them, but I think a closer analogue could be partial classes in C#.
If you forward declare something, you have to actually deliver and implement it, else the compiler will complain. The header allows you to display a "module"'s public API and make the declarations available (for type checking and so) to other parts of the program.
Comprehensive reading: Learning C from Java. Recommended reading for developers who are coming from Java to C.
I think that there is much derision (mockery, laughter, contempt, ridicule) for Java simply because it's popular.
Abstract classes and interfaces specify a contract or a set of functions that can be invoked on an object of a certain type. Function prototypes in C only really do compile time type checking of function arguments/return values.
While your first question seems subjective to me, I will answer to the second one:
A header file contains the declarations which are then made available to other files via #inclusion by the preprocessor.
For instance you will declare in a header a function, and you will implement in a .c file. Other files will be able to use the function so long they can see the declaration (by including the header file).
At linking time the linker will look among the object files, or the various libraries linked, for some object which provides the code for the function.
A typical pattern is: you distribute the header files for your library, and a dll (for instance) which contains the object code. Then in your application you include the header, and the compiler will be able to compile because it will find the declaration in the header. No need to provide the actual implementation of the code, which will be available for the linker through the dll.
C programs run directy, while Java programs run inside the JVM, so a common belief is that Java programs are slow. Also in Java you are hidden from some low level constructs (pointer, direct memory access), memory management, etc...
In C the declaration and definition of a function is separated. Declaration "declares" that there exists a function that called by those arguments returns something. Definition "defines" what the function actually does. The former is done in header files, the latter in the actual code. When you are compiling your code, you must use the header files to tell your compiler that there is such a function, and link in a binary that contains the binary code for the function.
In Java, the binary code itself also contains the declaration of the functions, so it is enough for the compiler to look at the class files to get both the definition and declaration of the available functions.

Resources