Readelf finding absolute address

Readelf finding absolute address - linker

I have a C programme which has one global and one local variable. My question is on the readelf. Following are my questions;
1. When i take the address dump using "readelf --symbols", i get an address for my global variable which is same as that of the address i print when i run the programme. How the readelf can know the absolute address before my programme is running or been loaded?
2. Why there are no informations on the local variables' symbols?. I can see only global variables' symbols.

How readelf can know the absolute address before my programme is running or been loaded?
Because by the time the linker has done its work, the address computed for
your global variable is the address at which the program loader will
have to place that variable at runtime. The job of the linker is largely to
put information into an executable that tells the program loader where
symbols are to be mapped in memory.
Why there are no informations on the local variables' symbols
There may be three kinds of "local" variables in your program.
main.c
static int static_filescope_i = 1;
int f()
{
static int static_local_i = 2;
return static_local_i;
}
int g()
{
int automatic_i = 3;
return automatic_i;
}
int global_i = 4;
int main()
{
return global_i + f() + g() + static_filescope_i;
}
An automatic variable like automatic_local_i is created on the stack at runtime
each time the program enters the block in which it is defined, and ceases to exist
then it leaves that block. Such a variable occupies no storage in the executable
so it is not represented in the symbol table.
A variable like static_filescope_i would often be called a static global,
to distinguish it from one like static_local_i. static_local_i cannot
be seen outside the block in which it is defined. static_filescope_i can
be seen in any function defined in the same object file (main.o) but
not in any other object file: it is global within main.o but
local to that object file within the program as a whole.
Both static_filescope_i and static_local_i must have its
initial value when the program first uses the variable and then keep
whatever value it has, or any new value assigned to it, until the next time
it is used - across function-calls, until the program ends. This means
that such variables need storage in the executable, not on the stack, and they may or may not be
represented in the symbol table.
global_i, of course, is global to the whole program: it can be seen in
main.o and any other other files we might link with main.o.
If we compile main.c with default options (no optimization):
$ gcc -c main.c
then we find:
$ readelf -s main.o | grep automatic_i
$
...no symbol for automatic_i.
$ readelf -s main.o | grep global_i
12: 0000000000000004 4 OBJECT GLOBAL DEFAULT 3 global_i
...a global symbol for global_i.
$ readelf -s main.o | grep static_filescope_i
5: 0000000000000000 4 OBJECT LOCAL DEFAULT 3 static_filescope_i
...a local symbol for static_filescope_i
$ readelf -s main.o | grep static_local_i
6: 0000000000000008 4 OBJECT LOCAL DEFAULT 3 static_local_i.1833
...and also a local symbol for static_local_i, but with a scope-distinguishing
suffix appended.
Here, GLOBAL means can be seen by the linker, and LOCAL means cannot be seen
by the linker.
So for the purpose of linking main.o with any other object files or libraries
to make an executable, static_filescope_i and static_local_i might as well
not exist.
That doesn't mean they are completely useless in the object file. They are useful
for debugging. They are useful for the purpose of investigating what the static
storage of the executable is made up of, as we are doing now.
But they're no use to the linker and so, if you compile main.c with any optimization
level > 0 then the compiler will assume you want object code that is not for
purposes of debugging or investigation and it will not emit any local symbols:
$ gcc -O1 -c main.c
$ readelf -s main.o | grep static_local_i
$ readelf -s main.o | grep static_filescope_i
$ readelf -s main.o | grep global_i
11: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 global_i
...only global_i remains.
That should explain why you're not seeing any of the "local" symbols. You r automatic
variables are never in the symbol table. Your static variables are only in the
symbol table if you have disabled all optimization.

Related

Why the sequence of the parameters passes to the `gcc` influence the output of `readelf -d` for the built shared library?

Given these:
bar.cpp:
int foo();
int bar()
{
return foo();
}
foo.cpp:
int foo()
{
return 42;
}
The libfoo.so is built by gcc for foo.cpp,i.e. gcc -shared -o libfoo.so -fPIC foo.c
As it's all known that readelf -d could be used to show the dependency of a specific shared library.
$ gcc -shared -o libbar2.so -fPIC bar.c -lfoo -L.
$ gcc -shared -o libbar.so -lfoo -L. -fPIC bar.c
$ readelf -d libbar2.so | grep -i needed
0x0000000000000001 (NEEDED) Shared library: [libfoo.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
$ readelf -d libbar.so | grep -i needed
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Why the sequence of the parameters passes to the gcc influence the output of readelf -d for the built shared library?
All these tests are on Ubuntu16.04 with gcc 5.4.0.
Update:
$ ls -l libbar*
-rwxrwxr-x 1 joy joy 8000 Oct 4 23:16 libbar2.so
-rwxrwxr-x 1 joy joy 8000 Oct 4 23:16 libbar.so
$ sum -r libbar*
00265 8 libbar2.so
56181 8 libbar.so

The linking process is sequential and the order in which you specify the files is important. The file are treated in the order they are given. See this extract from the ld manual:
Some of the command-line options to ld may be specified at any point
in the command line. However, options which
refer to files, such as -l or -T, cause the file to be read at the point at which the option appears in the command
line, relative to the object files and other file options.
When you try to link a shared library into another one, the linker will lookup if there is any undefined reference that requires something from the library in all the files considered UP TO NOW(hence in your second example, there is no files prior to the libfoo library ) , and if there is none, the library is left aside, and the linking continue with the remaining files.
Here you also have a behaviour that may be surprising: it is possible (by default) to create shared libraries that still have undefined references (that means they are not self contained). That is what happen in your second example(libbar.so). If you want to avoid this behaviour to be sure you are not in this case you can add the -Wl,-no-undefined option (see https://stackoverflow.com/a/2356393/4871988).
If you add this option the second case will raise an error at link time.
EDIT: I found this other extract in the ld manual that explain this behaviour:
The linker will search an archive only once, at the location where it
is specified on the command line. If the
archive defines a symbol which was undefined in some object which appeared before the archive on the command
line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an
object appearing later on the command line will not cause the linker to search the archive again.
See the -( option for a way to force the linker to search archives multiple times.
You may list the same archive multiple times on the command line.
This also applies to shared libraries

does the dynamic linker modify the reference after executable is copied to memory?

let's say we have the following code:
main.c
extern void some_function();
int main()
{
some_function();
return 0;
}
mylib.c
void some_function()
{
...
}
and I have create a share library mylib.so and link i to executable object file prog
linux> gcc -shared -fpic -o mylib.so mylib.c
linux> gcc -o prog main.c ./mylib.so
and let's say the picture below is the executable object format of prog
By dynamic linking, we know that none of the code or data sections from mylib.so are actually copied into the executable prog2l at this point. Instead, the linker copies some relocation and symbol table information that will allow references to code and data in mylib.so to be resolved at load time.
and I just want double check if my understanding is correct:
when prog is loaded into the memory by the loader as the picture below show
then the dynamic linker will modify the .data section of prog in memory so that it can be linked/relocated to the instruction address of some_function in the .text section of mylib.so.
Is my understanding correct?

Fairly close. The dynamic linker will modify something in the data segment, not specifically the .data section - segments are a coarser-grained thing corresponding to how the file is mapped into memory rather than the original semantic breakdown. The actual section is usually called .got or .got.plt but may vary by platform. The modification is not "relocating it to the instruction address" but resolving a relocation reference to the function name to get the address it was loaded at, and filling that address in.

Function with same names in different files - C

I have two functions with same name and want to use it in my application.
I referred various answers like here and here but couldn't get clear solution.
I have following functions
// xxxx_input.h
int8_t input_system_init(InputParams params);
int8_t input_system_easy_load(uint32_t interval_ms);
// yyyy_input.h
int8_t input_system_init(InputParams params);
int8_t input_system_easy_load(uint32_t interval_ms);
Reason there are two files is xxxx_input and yyyy_input work way different internally.
Modifying the function isn't easy since the code is provided by external party and we have to keep the xxxx_input files.
What we can do is modify yyyy_input.h but functions like input_system_easy_load are to be kept consistent as they are called from different places.
Is there a way we can achieve the same?
I tried replacing xxxx_input with yyyy_input.h but since the include directory already contains the same function it gives error.
input_system_init multiply defined (by xxxx_input.o and yyyy_input.o).

If you have the source code to the functions defined in xxxx_input.h and yyyy_input.h, you could compile both modules with command line options redefining the function names via the preprocessor:
gcc -Dinput_system_init=xxxx_input_system_init -Dinput_system_easy_load=xxxx_input_system_easy_load xxxx_input.c
gcc -Dinput_system_init=yyyy_input_system_init -Dinput_system_easy_load=yyyy_input_system_easy_load yyyy_input.c
You would then compile your code with modified prototypes and you could link all 3 modules together.
If the modules are provided in object form only, you could define wrapper functions xxxx_input_system_init and xxxx_input_system_easy_load that you would link with the xxxx_input.o to produce a dynamic library, and the same for the yyyy alternatives. You would use modified prototypes in your module and would link it with the dynamic libraries.
Mike Kinghan showed a simpler approach for object files and libraries on systems where objcopy is available.
To get modified prototypes automatically, you can use this include file:
my_input_system.h:
#define input_system_init xxxx_input_system_init
#define input_system_easy_load xxxx_input_system_easy_load
#include "xxxx_input.h"
#undef input_system_init
#undef input_system_easy_load
#define input_system_init yyyy_input_system_init
#define input_system_easy_load yyyy_input_system_easy_load
#include "yyyy_input.h"
#undef input_system_init
#undef input_system_easy_load
/* prevent direct use of the redefined functions */
#define input_system_init do_not_use_input_system_init#
#define input_system_easy_load do_not_use_input_system_easy_load#

I'll walk through a solution you can use if your supplier has given you object files compiled
with GCC or Clang for GNU/Linux computers.
My supplier has given me a header foo_a.h file that declares function foo
$cat foo_a.h
#pragma once
extern void foo(void);
and the matching object file foo.o:
$ nm foo_a.o
0000000000000000 T foo
U _GLOBAL_OFFSET_TABLE_
U puts
that defines foo.
Likewise they've given me a header foo_b.h that also declares foo
$ cat foo_b.h
#pragma once
extern void foo(void);
and the matching object file foo_b.o
$ nm foo_b.o
0000000000000000 T foo
U _GLOBAL_OFFSET_TABLE_
U puts
that also defines foo.
The functions foo_a.o:foo and foo_b.o:foo do different
things (or different variations of the same thing). I want to do both these things in the same program,
prog.c:
$ cat prog.c
extern void foo_a(void);
extern void foo_b(void);
int main(void)
{
foo_a(); // Calls `foo_a.o:foo`
foo_b(); // Calls `foo_b.o:foo`
return 0;
}
I can make such a program as follows:
$ objcopy --redefine-sym foo=foo_a foo_a.o prog_foo_a.o
$ objcopy --redefine-sym foo=foo_b foo_b.o prog_foo_b.o
Now I have made a copy prog_foo_a.o of foo_a.o in which the symbol foo is
renamed foo_a, and a copy prog_foo_b.o of foo_b.o in which
the symbol foo is renamed foo_b.
Then I compile and link like this:
$ gcc -c -Wall -Wextra prog.c
$ gcc -o prog prog.o prog_foo_a.o prog_foo_b.o
And prog runs like:
$ ./prog
foo_a
foo_b
Perhaps my supplier has given me foo_a.o within a static library liba.a
that also contains other object files that refer to foo_a.o:foo? And similarly
with foo_b.o.
That's OK. Instead of:
$ objcopy --redefine-sym foo=foo_a foo_a.o prog_foo_a.o
$ objcopy --redefine-sym foo=foo_b foo_b.o prog_foo_b.o
I will run:
$ objcopy --redefine-sym foo=foo_a liba.a libprog_a.a
$ objcopy --redefine-sym foo=foo_b libb.a libprog_b.a
and this will give me a new static library libprog_a.a
in which foo is renamed foo_a in all the object files in the
library. Similarly foo is renamed foo_b throughout libprog_b.a.
Then I'll link prog:
$ gcc -o prog prog.o -L. -lprog_a -lprog_b
Consider a potential drawback with this solution. Possibly my supplier has
given me foo_a.o and foo_b.o with debugging information in them, and I want to
used it for debugging my prog with gdb?
I have changed the original symbol names, foo_a.o:foo to foo_a and
foo_b.o:foo to foo_b, but I haven't changed the debugging info associated
with those symbols. Debugging with gdb will still work, but some of the
debugging output will be incorrect and possibly confusing. E.g. if I put
a breakpoint on foo_a, gdb will run to it and stop, but it will say
it has stopped at foo from file foo_a.c. And if I then breakpoint at foo_b, gdb will run to
it and again say it is at foo, but from file foo_b.c. If the person doing
the debugging doesn't know how the program was built, this would certainly be
confusing.
But giving you debugging info with binaries is not far from giving you the
source code, so as you haven't got source code you likely don't have
debugging info and are not concerned about it.

Remove dead code when linking static library into dynamic library

Suppose I have the following files:
libmy_static_lib.c:
#include <stdio.h>
void func1(void){
printf("func1() called from a static library\n");
}
void unused_func1(void){
printf("printing from the unused function1\n");
}
void unused_func2(void){
printf("printing from unused function2\n");
}
libmy_static_lib.h:
void func(void);
void unused_func1(void);
void unused_func2(void);
my_prog.c:
#include "libmy_static_lib.h"
#include <stdio.h>
void func_in_my_prog()
{
printf("in my prog\n");
func1();
}
And here is how I link the library:
# build the static library libmy_static_lib.a
gcc -fPIC -c -fdata-sections --function-sections -c libmy_static_lib.c -o libmy_static_lib.o
ar rcs libmy_static_lib.a libmy_static_lib.o
# build libmy_static_lib.a into a new shared library
gcc -fPIC -c ./my_prog.c -o ./my_prog.o
gcc -Wl,--gc-sections -shared -m64 -o libmy_shared_lib.so ./my_prog.o -L. -l:libmy_static_lib.a
There are 2 functions in libmy_static_lib.c that are not used, and from this post, I think
gcc fdata-sections --function-sections
should create a symbol for each function, and
gcc -Wl,--gc-sections
should remove the unused symbols when linking
however when I run
nm libmy_shared_lib.so
It is showing that these 2 unused functions are also being linked into the shared library.
Any suggestions on how to have gcc remove the unused functions automatically?
Edit:
I am able to use the above options for gcc to remove the unused functions if I am linking a static library directly to executable. But it doesn't remove the unused functions if I link the static library to a shared library.

You can use a version script to mark the entry points in combination with -ffunction-sections and --gc-sections.
For example, consider this C file (example.c):
int
foo (void)
{
return 17;
}
int
bar (void)
{
return 251;
}
And this version script, called version.script:
{
global: foo;
local: *;
};
Compile and link the sources like this:
gcc -Wl,--gc-sections -shared -ffunction-sections -Wl,--version-script=version.script example.c
If you look at the output of objdump -d --reloc a.out, you will notice that only foo is included in the shared object, but not bar.
When removing functions in this way, the linker will take indirect dependencies into account. For example, if you turn foo into this:
void *
foo (void)
{
extern int bar (void);
return bar;
}
the linker will put both foo and bar into the shared object because both are needed, even though only bar is exported.
(Obviously, this will not work on all platforms, but ELF supports this.)

You're creating a library, and your symbols aren't static, so it's normal that the linker doesn't remove any global symbols.
This -gc-sections option is designed for executables. The linker starts from the entrypoint (main) and discovers the function calls. It marks the sections that are used, and discards the others.
A library doesn't have 1 entrypoint, it has as many entrypoints as global symbols, which explains that it cannot clean your symbols. What if someone uses your .h file in his program and calls the "unused" functions?
To find out which functions aren't "used", I'd suggest that you convert void func_in_my_prog() to int main() (or copy the source into a modified one containing a main()), then create an executable with the sources, and add -Wl,-Map=mapfile.txt option when linking to create a mapfile.
gcc -Wl,--gc-sections -Wl,--Map=mapfile.txt -fdata-sections -ffunction-sections libmy_static_lib.c my_prog.c
This mapfile contains the discarded symbols:
Discarded input sections
.drectve 0x00000000 0x54 c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-mingw32/6.2.1/crt2.o
.drectve 0x00000000 0x1c c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-
...
.text$unused_func1
0x00000000 0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
.text$unused_func2
0x00000000 0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
.rdata$zzz 0x00000000 0x38 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
...
now we see that the unused functions have been removed. They don't appear in the final executable anymore.
There are existing tools that do that (using this technique but not requiring a main), for instance Callcatcher. One can also easily create a tool to disassemble the library and check for symbols defined but not called (I've written such tools in python several times and it's so much easier to parse assembly than from high-level code)
To cleanup, you can delete the unused functions manually from your sources (one must be careful with object-oriented languages and dispatching calls when using existing/custom assembly analysis tools. On the other hand, the compiler isn't going to remove a section that could be used, so that is safe)
You can also remove the relevant sections in the library file, avoiding to change source code, for instance by removing sections:
$ objcopy --remove-section .text$unused_func1 --remove-section text$unused_func2 libmy_static_lib.a stripped.a
$ nm stripped.a
libmy_static_lib.o:
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 r .rdata$zzz
00000000 t .text
00000000 t .text$func1
00000000 T _func1
U _puts

Why do common section variables only show up in object file not the executable?

I'm trying to understand more about the "common" section of an executable and I noticed that when doing an objdump on compiled code, I can see variables placed in the common code only on object files (*.o) not on executables.
Why is that?
//test.c
int i[1000];
int main(){return 0;}
build command:
> gcc -g0 -fcommon -c test.c
> gcc -g0 -fcommon test.c
objdump shows i in the common section in the symbol table:
> objdump -x test.o
...
SYMBOL TABLE:
...
00000fa0 O *COM* 00000020 i
Unless I run it on the executable:
> objdump -x a.out
...
SYMBOL TABLE:
...
0804a040 g O .bss 00000fa0 i
If I rebuild the object file with the -fno-common flag instead it shows up in the .bss segment just like it does on the executable. Does the final executable not have this "COMMON" section?

The common section is something that the linker knows about. It basically puts all the common content into one of the three or four actual sections that [a typical] executable has (code or text, data, bss - sometimes there is a rodata as well).
So, your variable ends up in .bss in this case, as they are not initialized.
From gcc manual on -fcommon/-fno-common
In C code, controls the placement of uninitialized global variables.
Unix C compilers have traditionally permitted multiple definitions of
such variables in different compilation units by placing the variables
in a common block. This is the behavior specified by -fcommon, and is
the default for GCC on most targets. On the other hand, this behavior
is not required by ISO C, and on some targets may carry a speed or
code size penalty on variable references. The -fno-common option
specifies that the compiler should place uninitialized global
variables in the data section of the object file, rather than
generating them as common blocks. This has the effect that if the same
variable is declared (without extern) in two different compilations,
you get a multiple-definition error when you link them. In this case,
you must compile with -fcommon instead. Compiling with -fno-common is
useful on targets for which it provides better performance, or if you
wish to verify that the program will work on other systems that always
treat uninitialized variable declarations this way.
So, -fno-common or -fcommon will only make a difference if there is more than one global variable called i [and they should be of the same size, or your program becomes invalid, which is one grade worse than undefined behaviour!]

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight