I was wondering if there is way to remove ALL the unused functions listed in the map file for an embedded project developed in C and using the IAR embedded workbench for ARM IDE, which uses its own compiler and linker:
IAR C/C++ Compiler for ARM 8.30
IAR ELF Linker for ARM 8.30
IAR Assembler for ARM 8.30
I have noticed that not all the functions listed in the map file are the used functions that actually are used at run time, is there any optimization way to remove all unused functions?
For example a third library is used in the project and FuncA() is part of that inside which there might be a switch case and for every case a different static function in called, lets say FuncA1(), FuncA2(), ... FuncAn(). We would enter each case based on the code and usage of the FuncA() so it it obvious that not all of the FuncA1(), FuncA2(), ... FuncAn() functions would be called in the project, however, all of them are listed in the map file.
Is it possible to remove such functions from the map file? If yes how?
Removal of unused functions with external linkage is necessarily a function of the linker rather then the compiler. However a linker is not required to support that and any support is toolchain dependent and may require specific link-time optimisation switches to be applied.
Unused functions with static linkage could be removed by the compiler.
We could enter each case based on the code and the function that calls FuncA() so it it obvious that not all of the FuncA1(), FuncA2(), ... FuncAn() functions would be called
If the functions FuncAx() have static linkage, but are explicitly referenced in the function FuncA() with external linkage, then neither the compiler nor the linker should be able to remove the functions, because the compiler has no a-priori knowledge of how FuncA() will be called, and the linker has no reference to functions with static linkage, or necessarily understanding of the language semantics that would make it apparent the switch cases in question are not invoked.
It is possible I guess that a sophisticated toolchain with a C language aware linker and with link-time whole program optimisation might remove dead code more aggressively, but that is certainly tool-chain specific. Most linkers are source language agnostic and merely resolve symbols in the object code and in some case remove code to which no link has been made.
Related
In GCC10, gcc default to fno-common. That means, all tentative defined symbols are not common. I think gcc conforms to the C specification but it seems there are no common symbols in the native C program. Are common symbols only for extension syntax?
Does native C have common symbol?
Read the C11 standard n1570. Its index don't even mention common symbols.
Read carefully also the documentation of GCC and this draft report.
Perhaps you refer to the ELF file format used on Linux for object files and executables. There you can find a mention of common symbols, which tend to be deprecated .... Read the Linux ABI specification, etc here.
My recommendation is to declare all your public symbols as extern in some header file (#include-d in most of your *.c files), and define them once (without extern) in a single translation unit. You could use simple preprocessor tricks (such as X-macros).
You might be interested in using C code generators such as lemon or SWIG, or develop your script (with GNU awk or Guile or Python or GPP etc... ....) for simple metaprogramming techniques (autoconf could be inspirational) generating some C code. Configure your build automation tool (GNU make, ninja...) suitably.
You might be interested in using static analyzer options and precompiled headers of recent GCC. Look also into Clang static analyzer and clang tidy and Frama-C.
You surely want to pass -Wall -Wextra -g -H to gcc and read How to debug small programs and Modern C.
No, it has nothing to do with "extension syntax", and it has nothing to do with "common symbols" as a language construct. It simply refers to the behavior of variable declarations at file scope.
C says that if you place a declaration like int i; in a file, and don't elaborate on it anywhere else, then it will have external linkage and it will be considered to be defined to have a value of 0. This is called a "tentative definition". Declarations with the same name in different files, if they have external linkage, all refer to the same variable. Generally the way to use external linkage is to define a variable in one file, and use an extern declaration in any other files that make use of it.
In GCC with -fcommon, tentative definitions for the same variable can appear in more than one file. GCC will resolve this at link time, and allocate storage (initialized to zero) for the variable once.
In GCC with -fno-common, tentative definitions are resolved to definitions ASAP when the file is compiled. If more than one file contains a tentative definition for a variable, then this will cause a multiple definition error at link time.
As far as I can tell, the C standard doesn't require or prohibit either behavior. In particular, C does not have C++'s "one definition rule". However, the -fno-common behavior is generally less surprising, catches a forgotten extern sooner, and allows the compiler to optimize better (because it knows exactly where the variable lives when compiling, instead of waiting to find out later). For these reasons the default was changed in GCC.
I'm writing a C program where every bit of the executable size matters.
If, for example, only printf() from stdlib.h is required in my program, would including the header actually cause everything in that library to be copied into the CMake compiled executable?
CMake is just the build system generator. What ultimately goes into the final executable is decided by the linker and which options you use with it. Typical linkers will only link into the executable what they can determine to be necessary – unless you ask them to link everything. However there's some limits on how much they can reduce the footprint.
The rule of thumb is, that if you use a function found in foo.o, then the whole lot of foo.o gets linked; hence if size optimization is your goal, it's a good idea to give each function its own compilation unit.
What headers you use has no effect whatsoever, because headers are processed at compilation time, not linkage time.
Last but not least: In most implementation of the standard library, the printf family of functions is among the most heavyweight ones, so don't use them if you're beancounting.
As a principle, headers should be idempotent, that is, they should not affect the executable if the declarations are not used. stdlib.h should only have things like prototypes, pre-processor macro definitions and struct definitions, it should not contain executable code or variable declarations.
Standard library code is included by the linker as required. However, the C runtime-library library (RTL) might have this code in a DLL or shared object, depending on your platform. Using a DLL (or equivalent) does not affect the size of the executable file, but of course can affect the memory used. Since DLL code is shared between processes it is not uncommon for the C RTL to remain in memory, but, assuming dynamic linking, there will only be one copy, regardless of the number of C processes running. Most C RTLs will have some memory allocated per-process, but how much depends on the compiler/platform.
Programming Language : C
At our work,we have a project which has a header file say header1.h . This file contains some function which are declared as external scope (via extern) and also defined as inline in the same header file(header1.h).
Now this file is included at several places in different C files.
My understanding is that it will produce an error of multiple definitions with my past experience with GCC , and that is what I expect. But at our work we do not get these errors. Only difference is that we are using different compiler driver.
From my past experience, the best guess that I am making is that, the symbols are generated as weak symbols at the time of compilation and linker is using that information to choose one of them.
Could functions defined as inline result in weak symbols ? Is it possible, or there might be some other reason.
Also if inline can result in creation of weak symbols ,would there be a feature to turn it off or on.
If a function is inline, the entire function body will be copied in every time the function is used (instead of the normal assembler call/return semantic).
(Modern compilers, uses inline as a hint, and the actual result might just be a static function, with a unique copy in every compiled file it was used)
Is it the C preprocessor, compiler, or linkage editor?
To tell you the truth, it is programmer.
The answer you are looking for is... the compiler it depends. Sometimes it's the compiler, sometimes it's the linker, and sometimes it doesn't happen until the program is loaded.
The preprocessor:
handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
...
The language of preprocessor directives is agnostic to the grammar of C, so the C preprocessor can also be used independently to process other kinds of text files.
The linker:
takes one or more objects generated by a compiler and combines them into a single executable program.
...
Computer programs typically comprise several parts or modules; all
these parts/modules need not be contained within a single object file,
and in such case refer to each other by means of symbols. Typically,
an object file can contain three kinds of symbols:
defined symbols, which allow it to be called by other modules,
undefined symbols, which call the other modules where these symbols are defined, and
local symbols, used internally within the object file to facilitate relocation.
When a program comprises multiple object files, the linker combines
these files into a unified executable program, resolving the
symbols as it goes along.
In environments which allow dynamic linking, it is possible that
executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these.
The programmer must make sure everything is defined somewhere. The programmer is RESPONSIBLE for doing so.
Various tools will complain along the way if they notice anything missing:
The compiler will notice certain things missing, and will error out if it can realize that something's not there.
The linker will error out if it can't fix up a reference that's not in a library somewhere.
At run time there is a loader that pulls the relevant shared libraries into the process's memory space. The loader is the last thing that gets a crack at fixing up symbols before the program gets to run any code, and it will throw errors if it can't find a shared library/dll, or if the interface for the library that was used at link-time doesn't match up correctly with the available library.
None of these tools is RESPONSIBLE for making sure everything is defined. They are just the things that will notice if things are NOT defined, and will be the ones throwing the error message.
For symbols with internal linkage or no linkage: the compiler.
For symbols with external linkage: the linker, either the "traditional" one, or the runtime linker.
Note that the dynamic/runtime linker may choose to do its job lazily, resolving symbols only when they are used (e.g: when a function is called for the first time).
Is there a way with gcc and GNU binutils to mark some functions such that they will generate an error at link-time if used? My situation is that I have some library functions which I am not removing for the sake of compatibility with existing binaries, but I want to ensure that no newly-compiled binary tries to make use of the functions. I can't just use compile-time gcc attributes because the offending code is ignoring my headers and detecting the presence of the functions with a configure script and prototyping them itself. My goal is to generate a link-time error for the bad configure scripts so that they stop detecting the existence of the functions.
Edit: An idea.. would using assembly to specify the wrong .type for the entry points be compatible with the dynamic linker but generate link errors when trying to link new programs?
FreeBSD 9.x does something very close to what you want with the ttyslot() function. This function is meaningless with utmpx. The trick is that there are only non-default versions of this symbol. Therefore, ld will not find it, but rtld will find the versioned definition when an old binary is run. I don't know what happens if an old binary has an unversioned reference, but it is probably sensible if there is only one definition.
For example,
__asm__(".symver hidden_badfunc, badfunc#MYLIB_1.0");
Normally, there would also be a default version, like
__asm__(".symver new_badfunc, badfunc##MYLIB_1.1");
or via a Solaris-compatible version script, but the trick is not to add one.
Typically, the asm directive is wrapped into a macro.
The trick depends on the GNU extensions to define symbol versions with the .symver assembler directive, so it will probably only work on Linux and FreeBSD. The Solaris-compatible version scripts can only express one definition per symbol.
More information: .symver directive in info gas, Ulrich Drepper's "How to write shared libraries", the commit that deprecated ttyslot() at http://gitorious.org/freebsd/freebsd/commit/3f59ed0d571ac62355fc2bde3edbfe9a4e722845
One idea could be to generate a stub library that has these symbols but with unexpected properties.
perhaps create objects that have the name of the functions, so the linker in the configuration phase might complain that the symbols are not compatible
create functions that have a dependency "dont_use_symbol_XXX" that is never resolved
or fake a .a file with a global index that would have your functions but where the .o members in the archive have the wrong format
The best way to generate a link-time error for deprecated functions that you do not want people to use is to make sure the deprecated functions are not present in the libraries - which makes them one stage beyond 'deprecated'.
Maybe you can provide an auxilliary library with the deprecated function in it; the reprobates who won't pay attention can link with the auxilliary library, but people in the mainstream won't use the auxilliary library and therefore won't use the functions. However, it is still taking it beyond the 'deprecated' stage.
Getting a link-time warning is tricky. Clearly, GCC does that for some function (mktemp() et al), and Apple has GCC warn if you run a program that uses gets(). I don't know what they do to make that happen.
In the light of the comments, I think you need to head the problem off at compile time, rather than waiting until link time or run time.
The GCC attributes include (from the GCC 4.4.1 manual):
error ("message")
If this attribute is used on a function declaration and a call to such a function is
not eliminated through dead code elimination or other optimizations, an error
which will include message will be diagnosed. This is useful for compile time
checking, especially together with __builtin_constant_p and inline functions
where checking the inline function arguments is not possible through extern
char [(condition) ? 1 : -1]; tricks. While it is possible to leave the function
undefined and thus invoke a link failure, when using this attribute the problem
will be diagnosed earlier and with exact location of the call even in presence of
inline functions or when not emitting debugging information.
warning ("message")
If this attribute is used on a function declaration and a call to such a function is
not eliminated through dead code elimination or other optimizations, a warning
which will include message will be diagnosed. This is useful for compile time
checking, especially together with __builtin_constant_p and inline functions.
While it is possible to define the function with a message in .gnu.warning*
section, when using this attribute the problem will be diagnosed earlier and
with exact location of the call even in presence of inline functions or when not
emitting debugging information.
If the configuration programs ignore the errors, they're simply broken. This means that new code could not be compiled using the functions, but the existing code can continue to use the deprecated functions in the libraries (up until it needs to be recompiled).