Programming Language : C
At our work,we have a project which has a header file say header1.h . This file contains some function which are declared as external scope (via extern) and also defined as inline in the same header file(header1.h).
Now this file is included at several places in different C files.
My understanding is that it will produce an error of multiple definitions with my past experience with GCC , and that is what I expect. But at our work we do not get these errors. Only difference is that we are using different compiler driver.
From my past experience, the best guess that I am making is that, the symbols are generated as weak symbols at the time of compilation and linker is using that information to choose one of them.
Could functions defined as inline result in weak symbols ? Is it possible, or there might be some other reason.
Also if inline can result in creation of weak symbols ,would there be a feature to turn it off or on.
If a function is inline, the entire function body will be copied in every time the function is used (instead of the normal assembler call/return semantic).
(Modern compilers, uses inline as a hint, and the actual result might just be a static function, with a unique copy in every compiled file it was used)
Related
If I #include a file in C, do I get the entire contents of the file linked in, or just the parts I use?
If it has 10 functions in it, and I only use one of the functions, does the code for the other nine functions get included in my executable? This is especially relevant for me right now as I am working on a microcontroller and memory is precious.
Firstly, header files do not get "linked in". #include is basically a textual copy-paste feature. Everything from your include file gets pasted by preprocessor into the final translation unit, which will later be seamlessly processed by the compiler proper. The compiler proper knows nothing about any header files or #include directives.
Secondly, it means that if in your code you declared or defined some function or variable that you do not use, it is completely irrelevant whether it came from a header file through #include or was written directly in source file. There's absolutely no difference.
Thirdly, the question is: what exactly do you have in your header file that you include? Typically, header files do not define objects and functions, they simply declare them. Declarations do not produce any code, regardless whether you use the function or not. Declarations simply tell the compiler that the code (generated from the function definition) already exists elsewhere. Thus, as long as we are talking about typical header files, #include directives and header files by themselves have no effect on final code size.
Fourthly, if your header file is of some unusual kind that contains function (or object) definitions, then see "firstly" and "secondly" above. The compiler proper can see only one translation unit at a time, for which reason a typical strategy for the compiler proper is to completely discard unused entities with internal linkage (i.e. static objects and functions) and keep all entities with external linkage. Entities with external linkage cannot be discarded by compiler proper, since they might be needed in some other translation unit.
Fifthly, at linking stage linker can see the program in its entirety and, for that reason, can discard unused objects and functions, if it is advanced enough for that (and if you allow linker to do it). Meanwhile, inclusion-exclusion precision of a typical run-of-the-mill linker is limited to a single object file. Each object file is atomic to such linker. This means that if you want to be able to exclude unused functions on per-function basis, you might have to adopt "one function per object file" strategy, i.e. write one and only one function per .c file. Of course, this is only possible when you write your own code. If some third-party library you want to use does not adhere to this convention, then you might not be able to exclude individual functions.
If you #include a file in C, the entire contents of that file are added to your source file and compiled by your compiler. A header file, though, usually only has declarations of functions and no definitions (so no actual code is compiled).
The linker, on the other hand, takes all the functions from all the libraries and compiled source code and merges them into the final output file. At this time, the linker will discard any functions that you aren't using.
So, to answer your question: only the functions you use (and indirectly depend on) will be included in your final program file, and this is independent of what files you #include. Happy hacking!
You have to distinguish between different scenarios:
What does the included header file contain? Declarations of external functions only, or also static function definitions?
How are the implementations of the external functions distributed which are declared in that the header file you include? Are they all implemented in one .c file, or distributed across several .c files?
Regarding point 1: Only by #includeing external declarations, no other code will become part of your object file. And, definitions of static functions that are part of the header file, but which are not referenced by your code, may not become part of your object file - this is an optimization that is fairly common. It depends on your compiler, however.
Regarding point 2: Some linkers can only link whole object files, all or nothing. That means, if all the external functions declared in a header file are implemented in one .c file, and, if your code references at least one of these functions, chances are that you will get the whole object file, including all the other functions you don't use. Some linkers, however, can avoid this and remove unused parts when linking object files.
One brute-force approach to deal with non-optimizing linkers is, to put every external function into a .c file of its own. You will, however, have to find a way to deal with the situation that some of these functions refer to a common static function that is part of the original .c file...
Include simply presents the compiler ultimately with what looks like a single file (and if you do save-temps on GCC you will see that exactly a single file is presented to the actual compiler). It is no more complicated than that. So if you have some function prototypes or defines in your .c file then having them come from an include makes no difference whatsoever; the end result is the same.
If the things you include include code, functions, and not just prototypes, then it is the same as if you had those in the .c file itself. Whether or not those show up in the final binary has to do with whether or not you declared them as global or not using static, and then whether or not you optimized, etc. The same goes for variables and structures and other things.
Not all linkers are the same, but a common way to do it is whatever the compiler left in the object goes into the final binary. But if you take those objects and make a library out of them then some/many(?) linkers don’t suck everything into the binary on the portions that are required to resolve the dependencies.
When we include the header files in C , we actually add the declaration of the functions such as the printf , scanf etc. But how does the code for the function ( the function declaration ) get added to the program ?
That's done by the process of linking. Individually compiled translation units have a way of referring to dependent names symbolically, so your code would only say "call a function with name 'printf'", and it is the job of the linking procedure to look up those symbols in one of the provided object or library files.
The standard library is usually linked against your code implicitly, so you may not be aware of the fact that you are linking your code with pre-existing library code. You would definitely be aware of this if you used your own libraries.
Note that there is no standard for linking, so you cannot generally compile one file with one compiler and another file with a different compiler and then link them together. The problem is not just to agree on how names are represented, but also on how to generate code for function calls. There are however several "informal" calling conventions and name mangling rules on popular platforms that offer a degree of interoperability.
Let's say you are writing a library and you have a bunch of utility functions you have written just for yourself. Of course, you wouldn't want these functions to have external linkage so that they won't get mixed up by your library users (mostly because you are not going to tell the outside world of their existence)
On the other hand, these functions may be used in different translation units, so you want them to be shared internally.
Let's give an example. You have a library that does some stuff and in different source files you may need to copy_file and create_directory, so you would implement them as utility functions.
To make sure the user of your library doesn't accidentally get a linkage error because of having a function with the same name, I can think of the following solutions:
Terrible way: Copy paste the functions to every file that uses them adding static to their declaration.
Not a good way: Write them as macros. I like macros, but this is just not right here.
Give them such a weird name, that the chances of the user producing the same name would be small enough. This might work, but it makes the code using them very ugly.
What I do currently: Write them as static functions in an internal utils.h file and include that file in the source files.
Now the last option works almost fine, except it has one issue: If you don't use one of the functions, at the very least you get a warning about it (that says function declared static but never used). Call me crazy, but I keep my code warning free.
What I resorted to do was something like this:
utils.h:
...
#ifdef USE_COPY_FILE
static int copy_file(/* args */)
{...}
#endif
#ifdef USE_CREATE_DIR
static int create_dir(/* args */)
{...}
#endif
...
file1.c:
#define USE_COPY_FILE
#define USE_CREATE_DIR
#include "utils.h"
/* use both functions */
file2.c
#define USE_COPY_FILE
#include "utils.h
/* use only copy_file */
The problem with this method however is that it starts to get ugly as more utilities are introduced. Imagine if you have 10 of such functions, you need to have 7~8 lines of define before the include, if you need 7~8 of these functions!
Of course, another way would be to use DONT_USE_* type of macros that exclude functions, but then again you need a lot of defines for a file that uses few of these utility functions.
Either way, it doesn't look elegant.
My question is, how can you have functions that are internal to your own library, used by multiple translation units, and avoid external linkage?
Marking the functions static inline instead of static will make the warnings go away. It will do nothing about the code bloat of your current solution -- you're putting at least one copy of the function into each TU that uses it, and this will still be the case. Oli says in a comment that the linker might be smart enough to merge them. I'm not saying it isn't, but don't count on it :-)
It might even make the bloat worse, by encouraging the compiler to actually inline calls to the functions so that you get multiple copies per TU. But it's unlikely, GCC mostly ignores that aspect of the inline keyword. It inlines calls or not according to its own rules.
That's basically the best you can do portably. There's no way in standard C to define a symbol that's external from the POV of certain TUs (yours), but not from the POV of others (your users'). Standard C doesn't really care what libraries are, or the fact that TUs might be linked in several steps, or the difference between static and dynamic linking. So if you want the functions to be actually shared between your TUs, without any external symbol that could interfere with users of the library, then you need to do something specific to GCC and/or your static library or dll format to remove the symbols once the library is built but before the user links against it.
You can link your library normally, having these functions global, and localize them later.
objcopy can take global symbols and make them local, so they can't be linked with. It can also delete the symbol (the function stays, resolved references to it remain resolved, just the name is gone).
objcopy -L symbol localizes symbol. You can repeat -L multiple times.
objcopy -G symbol keeps symbol global, but localizes all others. You can repeat it also, and it will keep global all those you specified.
And I just found that I'm repeating the answer to this question, which Oli Charlesworth referenced in his comment.
Is it the C preprocessor, compiler, or linkage editor?
To tell you the truth, it is programmer.
The answer you are looking for is... the compiler it depends. Sometimes it's the compiler, sometimes it's the linker, and sometimes it doesn't happen until the program is loaded.
The preprocessor:
handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
...
The language of preprocessor directives is agnostic to the grammar of C, so the C preprocessor can also be used independently to process other kinds of text files.
The linker:
takes one or more objects generated by a compiler and combines them into a single executable program.
...
Computer programs typically comprise several parts or modules; all
these parts/modules need not be contained within a single object file,
and in such case refer to each other by means of symbols. Typically,
an object file can contain three kinds of symbols:
defined symbols, which allow it to be called by other modules,
undefined symbols, which call the other modules where these symbols are defined, and
local symbols, used internally within the object file to facilitate relocation.
When a program comprises multiple object files, the linker combines
these files into a unified executable program, resolving the
symbols as it goes along.
In environments which allow dynamic linking, it is possible that
executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these.
The programmer must make sure everything is defined somewhere. The programmer is RESPONSIBLE for doing so.
Various tools will complain along the way if they notice anything missing:
The compiler will notice certain things missing, and will error out if it can realize that something's not there.
The linker will error out if it can't fix up a reference that's not in a library somewhere.
At run time there is a loader that pulls the relevant shared libraries into the process's memory space. The loader is the last thing that gets a crack at fixing up symbols before the program gets to run any code, and it will throw errors if it can't find a shared library/dll, or if the interface for the library that was used at link-time doesn't match up correctly with the available library.
None of these tools is RESPONSIBLE for making sure everything is defined. They are just the things that will notice if things are NOT defined, and will be the ones throwing the error message.
For symbols with internal linkage or no linkage: the compiler.
For symbols with external linkage: the linker, either the "traditional" one, or the runtime linker.
Note that the dynamic/runtime linker may choose to do its job lazily, resolving symbols only when they are used (e.g: when a function is called for the first time).
I was reading some source code files in C and C++ (mainly C)...
I know the meaning of 'static' keyword is that static functions are functions that are only visible to other functions in the same file. In another context I read up it's nice to use static functions in cases where we don't want them to be used outside from the file they are written...
I was reading one source code file as I mentioned before, and I saw that ALL the functions (except the main) were static...Because there are not other additional files linked with the main source code .c file (not even headers), logically why should I put static before all functions? From WHAT should they be protected when there's only 1 source file?!
EDIT: IMHO I think those keywords are put just to make the code look bigger and heavier..
If a function is extern (default), the compiler must ensure that it is always callable through its externally visible symbol.
If a function is static, then that gives the compiler more flexibility. For example, the optimizer may decide to inline a function; with static, the compiler does not need to generate an additional out-of-line copy. Also, the symbol table will smaller, possibly speeding up the linking process too.
Also, it's just a good habit to get into.
It is hard to guess in isolation, but my assumption would be that it was written by someone who assumes that more files might be added at some point (or this file included in another project), so gives the least necessary access for the code to function. Essentially limiting the public API to the minimum.
But there are other files linked with your main module.
In fact, there are hundreds or even thousands, in the libraries. Most of these won't be selected for a small program but all the symbols are scanned by the linker. A collision between a symbol in main and an exported symbol from a library won't by itself cause any harm, but think of the trouble accidently naming something strcpy() could cause.
Also, it probably doesn't hurt to get used to the best-practice styles.
As a coding rule that I follow, any function (other than main()) that is visible outside its source file needs a declaration, and that declaration should be in a header. I avoid writing 'extern' declarations for functions in my source files it at all possible, and it almost always is possible.
If a function is only used inside a single source file, it should be static. That makes it much easier to modify; you know that the only place you need to look to see how it is used is the source file you have in front of you now (unless you make a habit of including '.c' files in other '.c' files - which is also a bad habit that should be broken now).
I use GCC to help me enforce the coding rule:
gcc -m64 -Wall -Wextra -std=c99 -Wmissing-prototypes -Wstrict-prototypes
That's a fairly typical collection of flags; I sometimes use -std=c89 instead of -std=c99; I sometimes use -m32 instead of -m64; I don't always use -Wextra (but my code is moving in that direction). I always use -Wmissing-prototypes and -Wstrict-prototypes to ensure that each external function is declared before it is defined or used (and each static function is either declared or defined before it is used). I occasionally use -Werror (so if the compile emits a warning, the compilation fails). I could use it more than I do since my code does compile without warnings - or gets fixed so that it does.
So, you could easily have been looking at my code. In my code, the only functions that are exposed - even in single source file programs - are the functions that are declared in a header, which means that they are part of the external interface to the module that the source file represents.
It may be that the author is taking precautions. For example, if someone else is using this file as a source by including it into his main file.
Because there are not other additional
files linked with the main source code
.c file (not even headers), logically
why should i put static before all
functions ? From WHAT should they be
protected when there's only 1 source
file?!
You honestly don't need the static keyword in this case.
EDIT: IMHO i think those keywords are put just to make the code looks bigger and heavier..
However, if you really want to read more about static keyword you can start with a book.
Some more info on the keyword static at #2216239 -- may help!