Lets say I have a code.c with two ordinary functions outer and inner.
outer calls inner.
I use GCC 11.2 on Linux, x86-64.
If I compile a shared lib with
gcc -shared -fPIC -O3 code.c
and look at the disassembly of
objdump -d a.out
I can see that the ´inner´ call withoin outer uses the PLT and is not inlined.
Thats fine and how it should be now even inner-library calls can eg. replaced by LD_PRELOAD.
If I add a main function and compile an executable instead
gcc -fPIE -O3 code.c
the call is inlined (if small enough etc.) an doesn`t use the PlT.
Fine too.
My problem is this call, a non-library executable with fPIC
gcc -fPIC -O3 code.c
Now the inner´ call does not use the PLT (but is not inline either). The unlinked asm (gcc call with -S) still uses the PLT, just the full binary and its disassembly not anymore. Adding an explicit -fsemantic-interposition` does not help.
Questions
How can I have PLT calls in my program that is not a library so that things like LD_PRELOAD work even for the functions there?
And whats the point of the non-shared fPIC behaviour to prevent inlining without using the PLT?
I misunderstood some thing about ld.so before. Functions in the main executable (ie. the startable program, not shared libraries) are never replacable, therefore using the PlT is useless.
How ld.so searches for functions
For PLT function calls, it always uses this search order and takes the first function it finds:
If a function with that name is available in main executable, this one wins
If a LD_PRELOAD shared lib was specified, it is checked next.
Then all normal shared libs are checked, in the order that was specified during linking.
(In each case, only functions in dynsym with global/weak visibility are considered).
No PRELOAD and no linking with -lsomething before the main code files will change anything about it, the main program always comes first.
This implies that a PLT lookup from the main program, for a function that exists there already, will always find its own function. Therefore no PLT lookup is necessary, and not doing it improves performance.
Dynsym availability
Unlike shared libraries, startable programs don't need all of their global functions listed in dynsym. There are several reasons why a function might be listed, but usually some are missing too.
As long as this behaviour is kept, a PlT lookup from the main program might not even find its own function, therefore again no PLT is better.
And what about the missing inline optimizations?
Turned out to be sort of trivial:
When compiling with -fPIC, it is not yet clear if that file will later be linked into a startable program or a shared library. Therefore it goes all the way to make it library-suitable: PLT and no inlining.
If it is then linked into a library, that's fine.
For an executable, the linker then removes the PLT indirection again - but it doesn't care about inlining-or-not.
Meanwhile with -fPIE, the compiler already knows that this will not become a library, and can do inlining and calls without PLT (at least some of them, and the linker reconverts the rest).
To have inlining, either pay attention to use fPIC only for libraries and fPiE only for executables, or turn on LTO (-flto) which can fix the "missing" inlining after it was made.
Related
So essentially I want to compile a c program statically with gcc, and I want it to be able to link c stdlib functions, but I want it to start at main, and not include the _start function as well as the libc init stuff that happens before main. Normally when you want to compile a program without _start, you run gcc with the -nostdlib flag, but I want to also be able to include code from stdlib, just not the libc init. Is there any way to do this?
I know that this could cause a lot of problems, but for my use case I'm not actually running the c program itself so it makes sense to do this.
Thanks in advance
The option -nostdlib tells the linker to not use the startup files (ie. the code that is executed before the main).
-nostdlib
Do not use the standard system startup files or libraries when linking.
No startup files and only the libraries you specify are
passed to the linker, and options specifying linkage of the system
libraries, such as -static-libgcc or -shared-libgcc, are ignored.
The compiler may generate calls to memcmp, memset, memcpy and memmove.
These entries are usually resolved by entries in libc. These
entry points should be supplied through some other mechanism when this
option is specified.
It is frequent to use this option in low-level bare-metal programming in order to control exactly what is going on.
You can still use the functions of your libc by using -lc. However keep in mind that some of the libc function depend on the startup code. For example in some implementations printf requires dynamic memory allocation and the heap is initialized by the startup code.
I need to find the dead code(functions not used) in my "C" language Project(having multiple C files) with gcc compiler. Please let me know the gcc options to find the dead code. Your help will be greatly appreciated.
For unused static functions, see Ed King's answer.
For global functions you could try this: Build the project twice, once as usual and once with -ffunction-sections -Wl,--gc-sections (the first is a compiler flag, the second a linker flag). Then you can run nm on the generated binaries to obtain a list of symbols for both runs. The linker will have removed unused functions in the second run, so that is your list of dead functions.
This assumes a common target like ELF, the binutils linker, and that the final binaries are not stripped of their symbol table.
You can use the GCC compiler option -Wunused-function to warn you of unused static functions. I'm not sure how you would detect unused 'public' functions though, save for looking at the map file for functions that haven't been linked.
new to using C
Header files for libraries like stdlib do not contain the actual implementation code for the functions they provide access to. I understand that the actual source text for libraries like this aren't needed to compile, but how does this work specifically? Are the implementation details for these libraries contained within the compiler?
When you use a function like printf(), including the header file essentially pastes in code for the declaration of the function, but normally the implementation code would need to be available as well.
What form is it stored in? (and where?) Is this compiler specific? Would it be possible to write custom code and reference it in this way without modifying the behavior of the compiler?
I've been searching around and found some info that is relevant but nothing specific. This could be related to not formulating the question well. Thanks.
When you link a program, the compiler will implicitly add some extra libraries to your program:
$ ls
main.c
$ cc -c main.c
$ cc main.o
$ ls
main.c main.o a.out
You can discover the extra libraries a program uses with ldd. Here, there are three libraries linked into the program, and I didn't ask for any of them:
$ ldd a.out
linux-vdso.so => (0x00...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00...)
/lib64/ld-linux-x86-64.so.2 (0x00...)
So, what happens if we link without these libraries? That's easy enough, just use the linker (ld) directly, instead of calling it through cc. When you use ld, it doesn't give you these extra libraries, so you get an error:
$ ld main.o
Undefined symbols:
"_printf", referenced from:
_main in main.o
The implementation for printf() is stored in the standard C library, which is usually just another library on your system... the only difference is that it gets automatically included into your program when you compile C.
You can use nm to find out what symbols are in a library, so I can use it to find printf() in libc:
$ nm -D /lib/x86_64-linux-gnu/libc-2.13.so | grep printf
...
000000000004e4b0 T printf
...
So, now that we know that libc has printf(), we can use -lc to tell the linker to include libc, and that will get rid of the errors about printf() being missing:
$ ld main.o -lc
There might be some other bits missing, and that's why we use cc to link our programs instead of ld: cc gives us all the default libraries.
When you compile a file you only need to promise the compiler that you have certain functions and symbols. A function call is in the compiled into a call [some_address]
The compiler will compile each C-file into object files that just have place holders for calls to functions declared in the headers. That is [some_address] does not need to be known at this point.
A number of oject files can be collected into what is known as a library.
After that it is the linkers job to look through all object files and libraries it know of and find out what the real value of all unknown [some_address] is and translate the call to, e.g. call 0x1234 if the particular function you are calling starts at 0x1234 (or it might be a relative offset from the current program pointer.
Stdlib and other library functions are implemented in an object library. A library is a collection of code that is linked with your program. By default C programs are linked against the stdlib library, which is usually provided by the operating system. Most modern operating systems use a dynamical linker. That is, your program is not linked against the library until it is executed. When it is being loaded, the linker-loader combines your code and the library code in your program's address space. You code and then make a call to the printf() code that is located in that library.
Usually a header file contains only a function prototype while the implementation is either in a separate source file or a precompiled library in the case of stdlib (and other libraries, both shipped with a compiler or available separately) the precompiled library gets linked at the end of the compilation process. (There's also a distinction between static and dynamic libraries, but I won't go into detail about that)
The implementation of standard libraries (which are shipped with a compiler) are usually compiler specific (there is a standard describing which functions have to be in a library, but the compiler programmer can decide how exactly he implements them) and it is (in theory) possible to exchange these libraries with some of your own without modifying the behaviour of the compiler (though not recommended as you would have to rewrite the entire library in order to ensure that all functions are contained).
Background:
I am working on a project written in a mix of C and Fortran 77 and now need to link the LAPACK/BLAS libraries to the project (all in a Linux environment). The LAPACK in question is version 3.2.1 (including BLAS) from netlib.org. The libraries were compiled using the top level Makefile (make lapacklib and make blaslib).
Problem:
During linking, error messages claimed that certain (not all) BLAS-routines called from LAPACK-routines were undefined. This gave me some headache but the problem was eventually solved when (in the Makefile) the order of appearance of the libraries to be linked was changed.
Code:
In the following, (a) gives errors while (b) does not. The linking is performed by (c).
(a) LIBS = $(LAPACK)/blas_LINUX.a $(LAPACK)/lapack_LINUX.a
(b) LIBS = $(LAPACK)/lapack_LINUX.a $(LAPACK)/blas_LINUX.a
(c) gcc -Wall -O -o $# project.o project.a $(LIBS)
Question:
What could be the reason for the undefined references of only some routines and what makes the order of appearance relevant?
The LAPACK library needs stuff from BLAS, and the linker searches from left to right. So, putting BLAS after LAPACK (option (b)), worked.
If you want it to always work, regardless of the order, you can use linker groups:
-Wl,--start-group $(LAPACK)/blas_LINUX.a $(LAPACK)/lapack_LINUX.a -Wl,--end-group
That tells the linker to loop through the libraries until all symbols get resolved (or until it notices that looping again won't help).
Typically one always puts the "more fundamental/basic" library to the right of the "less fundamental/basic" - ie, the linker will look to the right of a file for the definition of a function appearing in said file. This is supposedly not necessary any more with modern linkers, but it's always a good idea (as in your case). I'm not sure why it only mattered with several routines.
Is clapack used as a LAPACK implementation? If no you can try to use it.
I want to compile my C-code without the (g)libc. How can I deactivate it and which functions depend on it?
I tried -nostdlib but it doesn't help: The code is compilable and runs, but I can still find the name of the libc in the hexdump of my executable.
If you compile your code with -nostdlib, you won't be able to call any C library functions (of course), but you also don't get the regular C bootstrap code. In particular, the real entry point of a program on Linux is not main(), but rather a function called _start(). The standard libraries normally provide a version of this that runs some initialization code, then calls main().
Try compiling this with gcc -nostdlib -m32:
// Tell the compiler incoming stack alignment is not RSP%16==8 or ESP%16==12
__attribute__((force_align_arg_pointer))
void _start() {
/* main body of program: call main(), etc */
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
__builtin_unreachable(); // tell the compiler to make sure side effects are done before the asm statement
}
The _start() function should always end with a call to exit (or other non-returning system call such as exec). The above example invokes the system call directly with inline assembly since the usual exit() is not available.
The simplest way to is compile the C code to object files (gcc -c to get some *.o files) and then link them directly with the linker (ld). You will have to link your object files with a few extra object files such as /usr/lib/crt1.o in order to get a working executable (between the entry point, as seen by the kernel, and the main() function, there is a bit of work to do). To know what to link with, try linking with the glibc, using gcc -v: this should show you what normally comes into the executable.
You will find that gcc generates code which may have some dependencies to a few hidden functions. Most of them are in libgcc.a. There may also be hidden calls to memcpy(), memmove(), memset() and memcmp(), which are in the libc, so you may have to provide your own versions (which is not hard, at least as long as you are not too picky about performance).
Things might get clearer at times if you look at the produced assembly (use the -S flag).