Resolve undefined reference by stripping unused code - c

Assume we have the following C code:
void undefined_reference(void);
void bad(void) {
undefined_reference();
}
int main(void) {}
In function bad we fall into the linker error undefined reference to 'undefined_reference', as expected. This function is not actually used anywhere in the code, though, and as such, for the execution of the program, this undefined reference doesn't matter.
Is it possible to compile this code successfully, such that bad simply gets removed as it is never called (similar to tree-shaking in JavaScript)?

This function is not actually used anywhere in the code!
You know that, I know that, but the compiler doesn't. It deals with one translation unit at a time. It cannot divine out that there are no other translation units.
But main doesn't call anything, so there cannot be other translation units!
There can be code that runs before and after main (in an implementation-defined manner).
OK what about the linker? It sees the whole program!
Not really. Code can be loaded dynamically at run time (also by code that the linker cannot see).
So neither the compiler nor linker even try to find unused function by default.
On some systems it is possible to instruct the compiler and the linker to try and garbage-collect unused code (and assume a whole-program view when doing so), but this is not usually the default mode of operation.
With gcc and gnu ld, you can use these options:
gcc -ffunction-sections -Wl,--gc-sections main.c -o main
Other systems may have different ways of doing this.

Many compilers (for example gcc) will compile and link it correctly if you
Enable optimizations
make function bad static. Otherwise, it will have external linkage.
https://godbolt.org/z/KrvfrYYdn
Another way is to add the stump version of this function (and pragma displaying warning)

Related

Using compiler builtins without linking the c standard library

I've seen this question, whose answers conclude that builtin math functions (like __builtin_sin, __builtin_fmod, etc.) can be substituted for functions from the C standard library.
I wrote the following program:
float fmod_test(float arg1, float arg2) {
return __builtin_fmod(arg1, arg2)
}
void _start() {}
And compiled it as follows:
gcc -nostdlib test.c -o test
Unfortunately, I received the following error:
/tmp/ccuHpvCP.o: In function `fmod_test':
test.c:(.text+0x1d): undefined reference to `fmod'
collect2: error: ld returned 1 exit status
It seems that __builtin_fmod uses fmod in the background and needs to link to it, rather than producing an inline version as might be expected of a "built in" function.
Is there any way of using these builtin functions without linking to external libraries?
The answer to this question depends on exactly which C compiler you are using. You appear to be using GCC; the answer for that compiler is no.
These functions are "built in" in the sense that GCC knows their names and can optimize away some calls to them, for instance fmod(7.0, 2.0) may well be evaluated at compile time. But GCC does not provide runtime definitions of these functions. It relies on the C library, which is a separate project, to provide them.
As the gcc manual says:
Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
So there is no guaranteed way to avoid the possibility of a call to the library function.
However, you can experiment with how you call the function and your optimization options, in hopes of finding a combination that does get inlined. In particular, with floating point builtins, gcc will usually only inline them if -ffast-math is in effect, because its inline code may not attain as much precision or handle all corner cases (NaN, infinity, denormals, setting errno, etc) as the carefully-written library function would. That's the case here, and indeed if you enable -ffast-math you do get inlined code: see on godbolt. (It will look better if you turn on optimization.)
Of course, if you later change your compiler options, or call the function in a different way, or switch to a different compiler version, the compiler might again emit a library call. You'll know if this happens because your program won't link, so at least it won't break silently, and you can then try to readjust your code and/or compilation options, or if necessary, write or import your own version of the function.

C compiles with an undefined symbol

I am using an older version of the Diab C compiler.
In my code I have taken a function name and redefined it as a function pointer with the same signature. Before making this change the code worked. After the change it made it caused the embedded system to lock up.
The function pointer was declared extern in a header, defined in one .c file, and used in another .c file. When it was called from the second .c file it would cause the system to lock up. When I attempted to add debug information using sprintf it finally told me that it was an undefined symbol. I realized that the header file was was not included in the second .c file. When I #included it everything compiled and worked correctly.
My question is, is there some C rule that allowed the compiler to deduce the function signature even though the symbol was undefined at the call location? To my understanding there should have been an error long before I made any changes.
If no declaration is available, the compiler uses a default declaration of a function taking an unknown number of arguments and returning an int. If you turn up compiler warnings (eg -Wall -Wextra -Werror with gcc, check the documentation for your compiler), you should get a compile time warning.
Most likely, the code at first worked because it was compiled in the C89 or similar mode. The C standard from 1989 allows calling functions without first declaring them.
When you changed the code to use a pointer but didn't include the declaration of the pointer, the compiler assumed that your pointer was in fact a function and generated code to call into the pointer, as if the pointer had executable code inside. As the result, the program understandably stopped working.
What you should do is enable all possible warnings (for gcc: -Wall, -Wextra and make sure optimization is enabled (-O2 is good) because it enables code analysis), especially for calling functions without prototypes. A better thing might be to switch the compiler into the C99 mode (-std=c99 in gcc) or switch to a C99 compiler. The C standard from 1999 prohibits calling functions without prototypes and comes with some useful features absent in C89.

Detection of unused function and code in C

Iam writing a program in C. Is there any way(gcc options) to identify the unused code and functions during compilation time.
If you use -Wunused-function, you will get warnings about unused functions. (Note that this is also enabled when you use -Wall).
See http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html for more details.
gcc -Wall will warn you about unused static functions and some other types of unreachable code. It will not warn about unused functions with external linkage, though, since that would make it impossible to write a library.
No, there is no way to do this at compile time. All the compiler does is create object code - it does not know about external code that may or may not call functions you write. Have you ever written a program that calls main? It is the linker that determines if a function (specifically, a symbol) is used in the application. And I think GCC will remove unused symbols by default.

Can GCC not complain about undefined references?

Under what situation is it possible for GCC to not throw an "undefined reference" link error message when trying to call made-up functions?
For example, a situation in which this C code is compiled and linked by GCC:
void function()
{
made_up_function_name();
return;
}
...even though made_up_function_name is not present anywhere in the code (not headers, source files, declarations, nor any third party library).
Can that kind of code be accepted and compiled by GCC under certain conditions, without touching the actual code? If so, which?
Thanks.
EDIT: no previous declarations or mentions to made_up_function_name are present anywhere else. Meaning that a grep -R of the whole filesystem will only show that exact single line of code.
Yes, it is possible to avoid reporting undefined references - using --unresolved-symbols linker option.
g++ mm.cpp -Wl,--unresolved-symbols=ignore-in-object-files
From man ld
--unresolved-symbols=method
Determine how to handle unresolved symbols. There are four
possible values for method:
ignore-all
Do not report any unresolved symbols.
report-all
Report all unresolved symbols. This is the default.
ignore-in-object-files
Report unresolved symbols that are contained in shared
libraries, but ignore them if they come from regular object
files.
ignore-in-shared-libs
Report unresolved symbols that come from regular object
files, but ignore them if they come from shared libraries. This
can be useful when creating a dynamic binary and it is known
that all the shared libraries that it should be referencing
are included on the linker's command line.
The behaviour for shared libraries on their own can also be
controlled by the --[no-]allow-shlib-undefined option.
Normally the linker will generate an error message for each
reported unresolved symbol but the option --warn-unresolved-symbols can
change this to a warning.
TL;DR It can not complain, but you don't want that. Your code will crash if you force the linker to ignore the problem. It'd be counterproductive.
Your code relies on the ancient C (pre-C99) allowing functions to be implicitly declared at their point of use. Your code is semantically equivalent to the following code:
void function()
{
int made_up_function_name(...); // The implicit declaration
made_up_function_name(); // Call the function
return;
}
The linker rightfully complains that the object file that contains the compiled function() refers to a symbol that wasn't found anywhere else. You have to fix it by providing the implementation for made_up_function_name() or by removing the nonsensical call. That's all there's to it. No linker-fiddling involved.
If you declare the prototype of the function before using it , it shold compile. Anyway the error while linking will remain.
void made_up_function_name();
void function()
{
made_up_function_name();
return;
}
When you build with the linker flag -r or --relocatable it will also not produce any "undefined reference" link error messages.
This is because -r will link different objects in a new object file to be linked at a later stage.
And then there is this nastiness with the -D flag passed to GCC.
$cat undefined.c
void function()
{
made_up_function_name();
return;
}
int main(){
}
$gcc undefined.c -Dmade_up_function_name=atexit
$
Just imagine looking for the definition of made_up_function_name- it appears nowhere yet "does things" in the code.
I can't think of a nice reason to do this exact thing in code.
The -D flag is a powerful tool for changing code at compile time.
If function() is never called, it might not be included in the executable, and the function called from it is not searched for either.
The "standard" algorithm according to which POSIX linkers operate leaves open the possibility that the code will compile and link without any errors. See here for details: https://stackoverflow.com/a/11894098/187690
In order to exploit that possibility the object file that contains your function (let's call it f.o) should be placed into a library. That library should be mentioned in the command line of the compiler (and/or linker), but by that moment no other object file (mentioned earlier in the command line) should have made any calls to function or any other function present in f.o. Under such circumstances linker will see no reason to retrieve f.o from the library. Linker will completely ignore f.o, completely ignore function and, therefore, remain completely oblivious of the call to made_up_function_name. The code will compile even though made_up_function_name is not defined anywhere.

Why before ALL functions (except for main()) there is a 'static' keyword?

I was reading some source code files in C and C++ (mainly C)...
I know the meaning of 'static' keyword is that static functions are functions that are only visible to other functions in the same file. In another context I read up it's nice to use static functions in cases where we don't want them to be used outside from the file they are written...
I was reading one source code file as I mentioned before, and I saw that ALL the functions (except the main) were static...Because there are not other additional files linked with the main source code .c file (not even headers), logically why should I put static before all functions? From WHAT should they be protected when there's only 1 source file?!
EDIT: IMHO I think those keywords are put just to make the code look bigger and heavier..
If a function is extern (default), the compiler must ensure that it is always callable through its externally visible symbol.
If a function is static, then that gives the compiler more flexibility. For example, the optimizer may decide to inline a function; with static, the compiler does not need to generate an additional out-of-line copy. Also, the symbol table will smaller, possibly speeding up the linking process too.
Also, it's just a good habit to get into.
It is hard to guess in isolation, but my assumption would be that it was written by someone who assumes that more files might be added at some point (or this file included in another project), so gives the least necessary access for the code to function. Essentially limiting the public API to the minimum.
But there are other files linked with your main module.
In fact, there are hundreds or even thousands, in the libraries. Most of these won't be selected for a small program but all the symbols are scanned by the linker. A collision between a symbol in main and an exported symbol from a library won't by itself cause any harm, but think of the trouble accidently naming something strcpy() could cause.
Also, it probably doesn't hurt to get used to the best-practice styles.
As a coding rule that I follow, any function (other than main()) that is visible outside its source file needs a declaration, and that declaration should be in a header. I avoid writing 'extern' declarations for functions in my source files it at all possible, and it almost always is possible.
If a function is only used inside a single source file, it should be static. That makes it much easier to modify; you know that the only place you need to look to see how it is used is the source file you have in front of you now (unless you make a habit of including '.c' files in other '.c' files - which is also a bad habit that should be broken now).
I use GCC to help me enforce the coding rule:
gcc -m64 -Wall -Wextra -std=c99 -Wmissing-prototypes -Wstrict-prototypes
That's a fairly typical collection of flags; I sometimes use -std=c89 instead of -std=c99; I sometimes use -m32 instead of -m64; I don't always use -Wextra (but my code is moving in that direction). I always use -Wmissing-prototypes and -Wstrict-prototypes to ensure that each external function is declared before it is defined or used (and each static function is either declared or defined before it is used). I occasionally use -Werror (so if the compile emits a warning, the compilation fails). I could use it more than I do since my code does compile without warnings - or gets fixed so that it does.
So, you could easily have been looking at my code. In my code, the only functions that are exposed - even in single source file programs - are the functions that are declared in a header, which means that they are part of the external interface to the module that the source file represents.
It may be that the author is taking precautions. For example, if someone else is using this file as a source by including it into his main file.
Because there are not other additional
files linked with the main source code
.c file (not even headers), logically
why should i put static before all
functions ? From WHAT should they be
protected when there's only 1 source
file?!
You honestly don't need the static keyword in this case.
EDIT: IMHO i think those keywords are put just to make the code looks bigger and heavier..
However, if you really want to read more about static keyword you can start with a book.
Some more info on the keyword static at #2216239 -- may help!

Resources