How to find dead code in C language Project with gcc compiler - c

I need to find the dead code(functions not used) in my "C" language Project(having multiple C files) with gcc compiler. Please let me know the gcc options to find the dead code. Your help will be greatly appreciated.

For unused static functions, see Ed King's answer.
For global functions you could try this: Build the project twice, once as usual and once with -ffunction-sections -Wl,--gc-sections (the first is a compiler flag, the second a linker flag). Then you can run nm on the generated binaries to obtain a list of symbols for both runs. The linker will have removed unused functions in the second run, so that is your list of dead functions.
This assumes a common target like ELF, the binutils linker, and that the final binaries are not stripped of their symbol table.

You can use the GCC compiler option -Wunused-function to warn you of unused static functions. I'm not sure how you would detect unused 'public' functions though, save for looking at the map file for functions that haven't been linked.

Related

GCC linked library for compile

Why do we have to tell gcc which library to link against when that information is already in source file in form of #include?
For example, if I have a code which uses threads and has:
#include <pthread.h>
I still have to compile it with -pthread option in gcc:
gcc -pthread test.c
If I don't give -pthread option it will give errors finding thread function definitions.
I am using this version:
gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
This may be one of the most common things that trip up beginners to C.
In C there are two different steps to building a program, compilation and linking. For the purposes of your question, these steps connect your code to two different types of files, headers and libraries.
The #include <pthread.h> directive in your C code is handled by the compiler. The compiler (actually preprocessor) literally pastes in the contents of pthread.h into your code before turning your C file into an object file.
pthread.h is a header file, not a library. It contains a list of the functions that you can expect to find in the library, what arguments they take and what they return. A header can exist without a library and vice-versa. The header is a text file, often found in /usr/include on Unix-derived systems. You can open it just like any C file to read the contents.
The command line gcc -lpthread test.c does both compilation and linking. In the old days, you would first do something like cc test.c, then ld -lpthread test.o. As you can see, -lpthread is actually an option to the linker.
The linker does not know anything about text files like C code or headers. It only works with compiled object files and existing libraries. The -l flag tells it which libraries to look in to find the functions you are using.
The name of the header has nothing to do with the name of the library. Here it's really just by the accident. Most often there are many headers provided by the library.
Especially in C++ there is usually one header per class and the library usually provides classes implementations from the same namespace. In C the headers are organized that they contain some common subset of functions - math.h contains mathematical operations, stdio.h provides IO functions etc.
They are two separate things. .h files holds the declarations, sometimes the inline function also. As we all know, every functions should have an implementation/definition to work. These implementations are kept seperately. -lpthread, for example is the library which holds the implementation of the functions declared in headers in binary form.
Separating the implementation is what people want when you don't want to share your commercial code with others
So,
gcc -pthread test.c
tell gcc to look for definitions declared in pthread.h in the libpthread. -pthread is expanded to libpthread by linker automatically
there are/were compilers that you told it where the lib directory was and it simply scanned all the files hoping to find a match. then there are compilers that are the other extreme where you have to tell it everything to link in. the key here is include simply tells the compiler to look for some definitions or even simpler to include some external file into this file. this does not necessarily have any connection to a library or object, there are many includes that are not tied to such things and it is a bad assumption. next the linker is a different step and usually a different program from the compiler, so not only does the include not have a one to one relationship with an object or library, the linker is not the compiler.

(C) How are the implementations of the stdlib functions stored and linked to in header files if the source code does not have to be provided directly?

new to using C
Header files for libraries like stdlib do not contain the actual implementation code for the functions they provide access to. I understand that the actual source text for libraries like this aren't needed to compile, but how does this work specifically? Are the implementation details for these libraries contained within the compiler?
When you use a function like printf(), including the header file essentially pastes in code for the declaration of the function, but normally the implementation code would need to be available as well.
What form is it stored in? (and where?) Is this compiler specific? Would it be possible to write custom code and reference it in this way without modifying the behavior of the compiler?
I've been searching around and found some info that is relevant but nothing specific. This could be related to not formulating the question well. Thanks.
When you link a program, the compiler will implicitly add some extra libraries to your program:
$ ls
main.c
$ cc -c main.c
$ cc main.o
$ ls
main.c main.o a.out
You can discover the extra libraries a program uses with ldd. Here, there are three libraries linked into the program, and I didn't ask for any of them:
$ ldd a.out
linux-vdso.so => (0x00...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00...)
/lib64/ld-linux-x86-64.so.2 (0x00...)
So, what happens if we link without these libraries? That's easy enough, just use the linker (ld) directly, instead of calling it through cc. When you use ld, it doesn't give you these extra libraries, so you get an error:
$ ld main.o
Undefined symbols:
"_printf", referenced from:
_main in main.o
The implementation for printf() is stored in the standard C library, which is usually just another library on your system... the only difference is that it gets automatically included into your program when you compile C.
You can use nm to find out what symbols are in a library, so I can use it to find printf() in libc:
$ nm -D /lib/x86_64-linux-gnu/libc-2.13.so | grep printf
...
000000000004e4b0 T printf
...
So, now that we know that libc has printf(), we can use -lc to tell the linker to include libc, and that will get rid of the errors about printf() being missing:
$ ld main.o -lc
There might be some other bits missing, and that's why we use cc to link our programs instead of ld: cc gives us all the default libraries.
When you compile a file you only need to promise the compiler that you have certain functions and symbols. A function call is in the compiled into a call [some_address]
The compiler will compile each C-file into object files that just have place holders for calls to functions declared in the headers. That is [some_address] does not need to be known at this point.
A number of oject files can be collected into what is known as a library.
After that it is the linkers job to look through all object files and libraries it know of and find out what the real value of all unknown [some_address] is and translate the call to, e.g. call 0x1234 if the particular function you are calling starts at 0x1234 (or it might be a relative offset from the current program pointer.
Stdlib and other library functions are implemented in an object library. A library is a collection of code that is linked with your program. By default C programs are linked against the stdlib library, which is usually provided by the operating system. Most modern operating systems use a dynamical linker. That is, your program is not linked against the library until it is executed. When it is being loaded, the linker-loader combines your code and the library code in your program's address space. You code and then make a call to the printf() code that is located in that library.
Usually a header file contains only a function prototype while the implementation is either in a separate source file or a precompiled library in the case of stdlib (and other libraries, both shipped with a compiler or available separately) the precompiled library gets linked at the end of the compilation process. (There's also a distinction between static and dynamic libraries, but I won't go into detail about that)
The implementation of standard libraries (which are shipped with a compiler) are usually compiler specific (there is a standard describing which functions have to be in a library, but the compiler programmer can decide how exactly he implements them) and it is (in theory) possible to exchange these libraries with some of your own without modifying the behaviour of the compiler (though not recommended as you would have to rewrite the entire library in order to ensure that all functions are contained).

How does a compiler find out which dynamic link library will be used in my code, if I only include headers-files, where is not describe it?

How does a compiler find out which dynamic link library will be used in my code, if I only include headers-files, where is not describe it?
#include <stdio.h>
void main()
{
printf("Hello world\n");
}
There is I only include
stdio.h
and my code is used
printf function
How it is known, in headers-files prototypes , macros and constant are described, but nothing about in which file "printf" is implement. How does then it works?
When you compile a runnable executable, you don't just specify the source code, but also a list of libraries from which undefined references are looked up. With the C standard library, this happens implicitly (unless you tell GCC -nostdinc), so you may not have been consciously aware of this.
The libraries are only consumed by the linker, not the compiler. The linker locates all the undefined references in the libraries. If the library is a static one, the linker just adds the actual machine code to your final executable. On the other hand, if the library is a shared one, the linker only records the name (and version?) of the library in the executable's header. It is then the job of the loader to find appropriate libraries at load time and resolve the missing dependencies on the fly.
On Linux, you can use ldd to list the load-time dependencies of a dynamically linked executable, e.g. try ldd /bin/ls. (On MacOS, you can use otool -L for the same purpose.)
As others have answered, the standard c library is implicitly linked. If you are using gcc you can use the -Wl,--trace option to see what the linker is doing.
I tested your example code:
gcc -Wl,--trace main.c
Gives:
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o
/tmp/ccCjfUFN.o
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/4.6/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/4.6/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crtn.o
This shows that the linker is using libc.so (and also ld-linux.so).
The library glibc is linked by default by GCC. There is no need to mention -l library when you are building your executable. Hence you find that the functions printf and others which are a part of glibc do not need any linking exclusively.
Technically your compiler does not figure out which libraries will be used. The linker (commonly ld) does this. The header files only tell the compiler what interface your library functions use and leaves it up to the linker to figure out where they are.
A source file goes a long path until it becomes an executable. Commonly
source.c -[preprocess]> source.i -[compile]> source.s -[assemble]> source.o -[link]> a.out
When you invoke cc source.c all those steps are done transparently for you in one go and the standard libraries (commonly libc.so) and executable loader (commonly crt0.o) are linked together.
Any additional libraries have to be passed as additional linker flags i.e. -lpthread.
I would say that depends on IDE or the compiler and system. Header file just contains interface information like name of function parameters it expects any attributes others and that's how compiler first convert your code to an intermediate object file.
After that comes linking where in code for printf is getting added to the executable either through static library or dynamic library.
Functions and other facilities like STL are part of C/C++ so they are either delivered by compiler or system. e.g on Solaris there is no debug version of C library unless you are using gcc. But on Visual Studio you have debug version msvcrt.dll and you can also link C library statically.
In short the answer is that code for printf and other functions in C library are added by compiler at link time.

Renaming symbols at compile time without changing the code in a cross platform way

In creating a static object is it possible to rename the symbols at compile time (without changing the code) in a cross platform way? I have recently had objcopy recommended, but linux is not the only target platform it must also work on a mac. I am compiling using gcc, so I was hoping that there was a gcc option of some sort.
I have heard about .def files, but this may have been misleading as the information about them that I have found seems to be for windows.
Edit:
I'm trying to change the name of C and Fortran functions, specifically pre-pending them with the word "wrap" in order to avoid symbol conflicts at link time.
is it possible to rename the symbols at compile time
You might be able to achieve it with preprocessor:
gcc -c foo.c -Dfoo=foo_renamed
You can use the gcc alias attribute to make multiple symbols that point to the same function.
void name1() __attribute__((alias ("name2")));
I'm not sure if the alias attribute works for other types of symbols (e.g. variables).

Linking LAPACK/BLAS libraries

Background:
I am working on a project written in a mix of C and Fortran 77 and now need to link the LAPACK/BLAS libraries to the project (all in a Linux environment). The LAPACK in question is version 3.2.1 (including BLAS) from netlib.org. The libraries were compiled using the top level Makefile (make lapacklib and make blaslib).
Problem:
During linking, error messages claimed that certain (not all) BLAS-routines called from LAPACK-routines were undefined. This gave me some headache but the problem was eventually solved when (in the Makefile) the order of appearance of the libraries to be linked was changed.
Code:
In the following, (a) gives errors while (b) does not. The linking is performed by (c).
(a) LIBS = $(LAPACK)/blas_LINUX.a $(LAPACK)/lapack_LINUX.a
(b) LIBS = $(LAPACK)/lapack_LINUX.a $(LAPACK)/blas_LINUX.a
(c) gcc -Wall -O -o $# project.o project.a $(LIBS)
Question:
What could be the reason for the undefined references of only some routines and what makes the order of appearance relevant?
The LAPACK library needs stuff from BLAS, and the linker searches from left to right. So, putting BLAS after LAPACK (option (b)), worked.
If you want it to always work, regardless of the order, you can use linker groups:
-Wl,--start-group $(LAPACK)/blas_LINUX.a $(LAPACK)/lapack_LINUX.a -Wl,--end-group
That tells the linker to loop through the libraries until all symbols get resolved (or until it notices that looping again won't help).
Typically one always puts the "more fundamental/basic" library to the right of the "less fundamental/basic" - ie, the linker will look to the right of a file for the definition of a function appearing in said file. This is supposedly not necessary any more with modern linkers, but it's always a good idea (as in your case). I'm not sure why it only mattered with several routines.
Is clapack used as a LAPACK implementation? If no you can try to use it.

Resources