I am reading this article on PLT (Process Linkage Table) and GOT (Global Offset Table). While the purpose of PLT is clear to me, I'm still confused about GOT. What I've understood from the article is that GOT is only necessary for variables declared as extern in a shared library. For global variables declared as static in a shared library code, it is not required.
Is my understanding right, or am I completely missing the point.
Perhaps your confusion is with the meaning of extern. Since the default linkage is extern, any variable declared outside function scope without the static keyword is extern.
The reason the GOT is necessary is because the address of variables accessed by the shared library code is not known at the time the shared library is generated. It depends either on the load address the library gets loaded at (if the definition is in the library itself) or the third-party code the variable is defined in (if the definition is elsewhere). So rather than putting the address inline in the code, the compiler generates code to read the shared library's GOT and then loads the address from the GOT at runtime.
If the variable is known to be defined within the same shared library (either because it's static or the hidden or protected visibility attribute it used) then the address relative to the code in the library can be fixed at the time the shared library file is generated. In this case, rather than performing a lookup through the GOT, the compiler just generates code to access the variable with program-counter-relative addressing. This is less expensive both at runtime and at load time (because the whole symbol lookup and relocation process can be skipped at load time).
Related
I understand that shared libraries are loaded into memory and used by various programs.
How can a program know where in memory the library is?
When a shared library is used, there are two parts to the linkage process. At compile time, the linker program, ld in Linux, links against the shared library in order to learn which symbols are defined by it. However, none of the code or data initializers from the shared library are actually included in the ultimate a.out file. Instead, ld just records which dynamic libraries were linked against and the information is placed into an auxiliary section of the a.out file.
The second phase takes placed at execution time, before main gets invoked. The kernel loads a small helper program, ld.so, into the address space and this gets executed. Therefore, the start address of the program is not main or even _start (if you have heard of it). Rather, it is actually the start address of the dynamic library loader.
In Linux, the kernel maps the ld.so loader code into a convenient place in the precess address space and sets up the stack so that the list of required shared libraries (and other necessary info) is present. The dynamic loader finds each of the required libraries by looking at a sequence of directories which are often point in the LD_LIBRARY_PATH environment variable. There is also a pre-defined list which is hard-coded into ld.so (and additional search places can be hard-coded into the a.out during link time). For each of the libraries, the dynamic loader reads its header and then uses mmap to create memory regions for the library.
Now for the fun part.
Since the actual libraries used at run-time to satisfy the requirements are not known at link-time, we need to figure out a way to access functions defined in the shared library and global variables that are exported by the shared library (this practice is deprecated since exporting global variables is not thread-safe, but it is still something we try to handle).
Global variables are assigned a statics address at link time and are then accessed by absolute memory address.
For functions exported by the library, the user of the library is going to emit a series of call assembly instructions, which reference an absolute memory address. But, the exact absolute memory address of the referenced function is not known at link time. How do we deal with this?
Well, the linker creates what is known as a Procedure Linkage Table, which is a series of jmp (assembly jump) instructions. The target of the jump is filled in at run time.
Now, when dealing with the dynamic portions of the code (i.e. the .o files that have been compiled with -fpic), there are no absolute memory references whatsoever. In order to access global variables which are also visible to the static portion of the code, another table called the Global Offset Table is used. This table is an array of pointers. At link time, since the absolute memory addresses of the global variables are known, the linker populates this table. Then, at run time, dynamic code is able to access the global variables by first finding the Global Offset Table, then loading the address of the correct variable from the appropriate slot in the table, and finally dereferencing the pointer.
I am reading this article on PLT (Process Linkage Table) and GOT (Global Offset Table). While the purpose of PLT is clear to me, I'm still confused about GOT. What I've understood from the article is that GOT is only necessary for variables declared as extern in a shared library. For global variables declared as static in a shared library code, it is not required.
Is my understanding right, or am I completely missing the point.
Perhaps your confusion is with the meaning of extern. Since the default linkage is extern, any variable declared outside function scope without the static keyword is extern.
The reason the GOT is necessary is because the address of variables accessed by the shared library code is not known at the time the shared library is generated. It depends either on the load address the library gets loaded at (if the definition is in the library itself) or the third-party code the variable is defined in (if the definition is elsewhere). So rather than putting the address inline in the code, the compiler generates code to read the shared library's GOT and then loads the address from the GOT at runtime.
If the variable is known to be defined within the same shared library (either because it's static or the hidden or protected visibility attribute it used) then the address relative to the code in the library can be fixed at the time the shared library file is generated. In this case, rather than performing a lookup through the GOT, the compiler just generates code to access the variable with program-counter-relative addressing. This is less expensive both at runtime and at load time (because the whole symbol lookup and relocation process can be skipped at load time).
I am little bit stack with kernel symbols type meaning.
Simple static symbols have the same meaning like C static. So local static variable have local scope and static allocation. Static functions scope is a file. But what about static exported symbols? How to deal with EXPORT_SYMBOL(), EXPORT_PER_CPU_SYMBOL(), EXPORT_UNUSED_SYMBOL() if macro export static symbol? What is the difference between global and exported symbols? Is it linker responsibility to add additional info for exported symbols? Is global static variable built-in kernel visible in all the kernel and loadable module?
Kernel exported symbols can be accessed from loadable module. Is it good style touch such symbols inside kernel.
When kernel resolve symbols is it lookup thru kernel symbols table?
Conceptually, using static keyword with function declaration means internal linkage -- so such function is only visible within single translation unit (*.o file). This may involve inlining of that function (in which case it will be unusable further), but since EXPORT_SYMBOL() takes address of static function, compiler should disable inlining optimization.
Implementation is a bit more complicated. This internal and external linkage rules are only apply to static ld linker which works when vmlinux or kernel module is built. Normally symbol with external linkage is added to symtab ELF section and when dynamic linker ld.so loads shared object it reads that section.
But when module is loaded Linux Kernel uses separate symbol table ksymtab. EXPORT_SYMBOL() adds symbol to that table, but this process is completely transparent to compiler-linker toolchain thus it is not related with internal and external linkage at all.
I modified the io-packet of qnx and calculating a timestamp in the recieve.c file at ip layer.
CODE:
uint64_t ipStart_time, IPLatency;
EXPORT_SYMBOL(IPLatency); //I am using this in Linux
void rtl_receive ()
{
ipStart_time = clock_cycles();
IPLatency = ipStart_time;
}
I want to read that timestamp in my user program:
So I did :
code:
extern uint64_t IPLatency;
But it is showing error: undefined reference to IPLatency
The extern keyword informs the compiler that it should expect a function or variable (symbol) to be defined in another of the linking objects. For a function this means that the compiler will not give an error if a function has not been implemented.
The error you get, undefined reference, indicates that the linker cannot find a exported symbol with that signature in any of the object files included in the linking.
I'm guessing a little bit here because there isn't a lot of information about where these respective files are. With that as a disclaimer here goes ....
You have edited a file that belongs to the Operating System Kernel and exported the symbol. That just makes the symbol visible to kernel modules that wish to link with the kernel and not to a userspace program. Your userspace program is not linked to the kernel and thus when you attempt to link the linker cannot find a reference to IPLatency and it exits.
The extern keyword tells the compiler that this symbol is external to this file, and so just assume it exists and let the linker worry about it. The purpose of the extern definition is so the compiler knows what the type of this variable is, and not to reserve memory for it. The linker needs to find all the symbols in order to turn a symbol in to an actual reference to the correct memory location. So an extern variable needs to be declared somewhere in the program that you are trying to link.
What you are attempting to do is transfer information from kernel space to userspace. This is going to be much more difficult, involving either a new system call or by making the information available via some other mechanism (e.g. sysfs). You will probably need to research some more and then ask another question about that (and the answer to that question will probably be beyond me).
The answer might be short: EXPORT_SYMBOL is used for exporting symbols inside the kernel, e.g. to other kernel modules. Your user space program would not have access to it, thus your linker would report a undefined reference error.
Suggestion: send the data via things like copy_to_user or write the information in /proc and let the userspace program read from it.
My question is about static variable (static void*) created inside shared library (let's call this library 'S'), but it's an internal variable not shown outside, but every call of API depends on it. Now let's think about a case, when a program (let's call it main program) links to another two shared libraries and every one of them is linked with library S. Now what happens to this static variable for our main program? Does it have one instance? Two?
Suma's answer is correct. There will only be a single instance of the static variable. This is also why having static globals in shared libraries can be a huge problem. One real-world example where this can happen:
Apache webserver which loads the following modules:
mod_php which is linked against
libxml2
mod_perl which loads
libxml2
Now if some PHP code modifies a global setting like a parser option in libxml2, the Perl code will also see these changes. This can lead to bugs that are extremely hard to diagnose. So you should avoid global state in your shared libraries at all cost.
(With libxml2 you can make most settings locally these days.)
Assuming your static variable is only defined in one translation unit, it will exist only once, as the shared library is loaded only once into the process.
This would get more difficult if a mixture of shared and static linking was used.
The compiler creates a different instance for every global static variable, even when you have several such variables with identical names.
In fact, the compiler (or possibly, the preprocessor) implicitly changes the name of every such variable, according to the name of the source file which declares it.
You can prove this to yourself by declaring a global static variable in a header file, and then include this header file in several different source files. Try to set it to a different value in each source file, and you'll see that this variable retains its different value in each source file.