Why is .rela.plt necessary for resolving PIC function addresses? - linker

While exploring ELF structure, I see this (this is objdump -d and readelf -r of the binary linked with a PIC .so containing ml_func):
0000000000400480 <_Z7ml_funcii#plt>:
400480: ff 25 92 0b 20 00 jmpq *0x200b92(%rip) # 601018 <_Z7ml_funcii>
Relocation section '.rela.plt' at offset 0x438 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000601018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 _Z7ml_funcii + 0
Isn't .rela.plt redundant? It seems to store the same offset 601018 which is already calculated at ml_func#plt.
Or is it useful for some more complex cases like different relocation types? Or is it just an optimization of some sort (like, I guess it might be not trivial to get the 601018 from outside the ml_func#plt...)?..
I guess this question is similar to Why does the linker generate seemingly useless relocations in .rela.plt?, where they write that
.rela.plt is used to resolve function addresses, even during lazy linking.
I guess I wonder why the resolver couldn't do its work without the .rela.plt.

The 601018 you see in .plt is not actually coming from that section. This is merely a helpful annotation which readelf is providing to you. readelf itself discovered this information by looking at .rela.plt.
When your program starts up, the global offset table (GOT) needs to contain an address inside the procedure linkage table (PLT) in order to bootstrap the dynamic linking logic. However, when your program is compiled, the compiler doesn't know yet know the absolute address of the PLT. That's why the .rela.plt section exists. The dynamic linker uses this information to patch the GOT when your program starts.

Related

Linker (ld) ELF Questions

I have an issue with an ELF file generated by the GNU linker ld.
The result is that the data section (.data) gets corrupted when the executable is loaded into memory. The corruption to the .data section occurs when the loader performs the relocation on the .eh_frame section using the relocation data (.rela.eh_frame).
What happens is that this relocation causes seven writes that are beyond the .eh_frame section and over-write the correct contents of the .data section which is adjacent to the top of the .eh_frame section.
After some investigation, I believe the loader is behaving correctly, but the ELF file it has been given contains an error.
But I could be wrong and wanted to check what I've found so far.
Using readelf on the ELF file, it can be seen that seven of the entries in the .rela.eh_frame section contain offsets that are outside (above) the range given by readelf for the .eh_frame section. ie The seven offsets in .rela.eh_frame are greater than the length given for .eh_frame. When these seven offsets are applied in the relocation, they corrupt the .data section.
So my questions are:
(1) Is my deduction that relocation offsets should not be greater than the length of the section to which they apply? And therefore the ELF file that has been generated is in error?
(2) What are people's opinions on the best way of proceeding to diagnose the cause of the incorrect ELF file? Are there any options to ld that will help, or any options that will remove/fix the .eh_frame and it's relocation counterpart .rela.eh_frame?
(3) How would I discover what linker script is being used when the ELF file is generated?
(4) Is there a specific forum where I might find a whole pile of linker experts who would be able to help. I appreciate this is a highly technical question and that many people may not have a clue what I'm talking about!
Thanks for any help!
The .eh_frame section is not supposed to have any run-time relocations. All offsets are fixed when the link editor is run (because the object layout is completely known at this point) and the ET_EXEC or ET_DYN object is created. Only ET_REL objects have relocations in that section, and those are never seen by the dynamic linker. So something odd most be going on.
You can ask such questions on the binutils list or the libc-help list (if you use the GNU toolchain).
EDIT It seems that you are using a toolchain configured for ZCX exceptions with a target which expects SJLJ exceptions. AdaCore has some documentation about his:
GNAT User's Guide Supplement for Cross Platforms 19.0w documentation ยป VxWorks Topics
Zero Cost Exceptions on PowerPC Targets
It doesn't quite say how t switch to the SJLJ-based VxWorks 5 toolchain. It is definitely not a matter of using the correct linker script. The choice of exception handling style affects code generation, too.

GCC linker: move a symbol in a specified section

It is possible to move some of the functions in the code in a specific section
on the executable? If so, how?
For an application compiled with gcc, we have more source files, including
X.c. Each object is compiled from the associated source (X.o is obtained from X.c) and the linker produces a big executable.
I need two functions from X.c to be in a specific section in the
executable, say .magic_section. The reason I want this is
that the section will be loaded in another area of memory than the rest of the sections.
My problem is that I can not change the source X.c, otherwise I would have used
a specific flag, such as __attribute__ ((section ("magic_section"))) for
the functions.
I read something in the documentation for the linker and wrote a custom script for the linker, but I failed to specify in which section a particular symbol must be placed. I only managed to move a whole section.
On way you could do probably do it (not great, but should work in theory) is to use --function-sections and --data-sections, assuming your GCC version / architecture supports those options, and then manually call out all the functions & variables that need to go in a given file with a linker script.
This creates sections called like things .text.function_name or .data.variable_name. If you're familiar with assigning sections via gcc attributes, I'll assume you know what to do in the linker.
As an advantage, that would let you cherry-pick functions if you don't actually want the entire file to go in a magic section.
Unfortunately, without modifying your binary objects, dynamic linker or dynamic loader you will not be able to accomplish this, and anyhow, this is a very difficult task.
Option 1 - ELF manipulation
Each ELF executable is made from sections, which contain the actual code/data/symbol strings/... and segments which help the loader decide things like where to load your code in memory, which symbols this ELF exposes, which symbols it requires from other locations, where to load specific code/data, etc.
You can observe the segments in your binary by typing
readelf -l [your binary]
The output will be similiar to the following (I chose ls as the binary):
[ishaypeled#ishay-dev bin]$ readelf -l --wide ./ls
Elf file type is EXEC (Executable file)
Entry point 0x4048bf
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x01b694 0x01b694 R E 0x200000
LOAD 0x01bdf0 0x000000000061bdf0 0x000000000061bdf0 0x000864 0x0016d0 RW 0x200000
DYNAMIC 0x01be08 0x000000000061be08 0x000000000061be08 0x0001f0 0x0001f0 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x01895c 0x000000000041895c 0x000000000041895c 0x00071c 0x00071c R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x01bdf0 0x000000000061bdf0 0x000000000061bdf0 0x000210 0x000210 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got
Now let's examine this output:
In the first table (Program Headers):
[Type] - Segment type, what is the purpose of this section
[Offset] - Offset in file where this segment begins
[VirtAddr] - Where we want to load this section in process address space (if this segment should be loaded at all, not all of them are loaded)
[PhysAddr] - Same as VirtAddr for all modern OS's I encountered
[FileSiz] - How big is this section on file. This is the link to your sections - the current segment consists of all sections in the range Offset to Offset+FileSiz
[MemSiz] - How big is this section in virtual memory (this does NOT have to be the same as the size on file! if it spans beyond the size in file the excess is set to 0)
[Flg] - Permission flags, R-read E-execute W-write.
[Align] - Required memory alignment in memory.
Your focus is on segments of type LOAD (PT_LOAD). These segments group data from sections, instruct the loader where to put them in the process address space and determine specify their permissions.
You can see a convenient section to segment mapping in the Section to Segment mapping table.
Lets observe the two LOAD segments 2 and 3:
We can see that segment 2 has read and execute permissions, and that it spans (among other) the .text and .rodata sections.
So, to achieve your purpose using ELF manipulation:
Locate the binary data that makes your functions in the file (readelf utility is your friend)
By modifying the ELF header (I don't know any tool that does this automatically, you'd probably have to write your own) split the segment containing .text section into two sequential LOAD segments, leaving out your function code
By modifying the ELF header create a new LOAD segment containing only your two functions
Update all references (if any) to the old function location to the new one
If you read up to here and understood everything, you should know this is a tremendously tedious, nearly impossible task for real life cases.
Option 2 - Dynamic linker manipulation
Note the INTERP segment type in the above example. This is an ASCII string that specifies which dynamic linker you should use.
The dynamic linker role is to parse the segments and perform all dynamic operations such as resolving symbols at runtime, loading segments from .so file, etc.
A possible manipulation here would be to modify the dynamic linker code (NOTE: this is a system wide change!) to load the functions binary data into a specific memory address in the process address space. Note that this approach has a couple of set backs:
It requires modification to the dynamic linker
You need to update all references to your functions within the ELF file still
Option 3 - Dynamic loader manipulation
Much like option 2, but modify the ld library facilities instead of the dynamic linker.
Conclusion
Exactly what you wish to do is very hard, and indeed a tedious task. I am working on a tool that allows injection of arbitrary functions into existing shared object files at the moment and I guarantee this to be at least a few good weeks of work.
Are you sure there isn't another way to achieve what you want? Why do you need these two functions in a separate address? Perhaps there is an easier solution...

What are the differences comparing PIE, PIC code and executable on 64-bit x86 platform?

The test is on Ubuntu 12.04 64-bit. x86 architecture.
I am confused about the concept Position Independent Executable (PIE) and Position Independent code (PIC), and I guess they are not orthogonal.
Here is my quick experiment.
gcc -fPIC -pie quickSort.c -o a_pie.out
gcc -fPIC quickSort.c -o a_pic.out
gcc a.out
objdump -Dr -j .text a.out > a1.temp
objdump -Dr -j .text a_pic.out > a2.temp
objdump -Dr -j .text a_pie.out > a3.temp
And I have the following findings.
A. a.out contains some PIC code, but only resists in the libc prologue and epilogue functions, as shown in below:
4004d0: 48 83 3d 70 09 20 00 cmpq $0x0,0x200970(%rip) # 600e48 <__JCR_END__>
In the assembly instructions of my simple quicksort program, I didn't find any PIC instructions.
B. a_pic.out contains PIC code, and I didn't find any non-PIC instructions... In the instructions of my quicksort program, all the global data are accessed by PIC instructions like this:
40053b: 48 8d 05 ea 02 00 00 lea 0x2ea(%rip),%rax # 40082c <_IO_stdin_used+0x4>
C. a_pie.out contains syntax-identical instructions comparing with a_pic.out. However, the memory addresses of a_pie.out's .text section range from 0x630 to 0xa57, while the same section of a_pic.out ranges from 0x400410 to 0x400817.
Could anyone give me some explanations of these phenomenons? Especially the finding C. Again, I am really confused about PIE vs. PIC, and have no idea how to explain the finding C..
I am confused about the concept Position Independent Executable (PIE) and Position Independent code (PIC), and I guess they are not orthogonal.
The only real difference between PIE and PIC is that you are allowed to interpose symbols in PIC, but not in PIE. Except for that, they are pretty much equivalent.
You can read about symbol interposition here.
C. a_pie.out contains syntax-identical instructions comparing with a_pic.out. However, the memory addresses of a_pie.out's .text section range from 0x630 to 0xa57, while the same section of a_pic.out ranges from 0x400410 to 0x400817.
It's hard to understand what you find surprising about this.
The PIE binary is linked just as a shared library, and so its default load address (the .p_vaddr of the first LOAD segment) is zero. The expectation is that something will relocate this binary away from zero page, and load it at some random address.
On the other hand, a non-PIE executable is always loaded at its linked-at address. On Linux, the default address for x86_64 binaries is 0x400000, and so the .text ends up not far from there.

What does this GCC error "... relocation truncated to fit..." mean?

I am programming the host side of a host-accelerator system. The host runs on the PC under Ubuntu Linux and communicates with the embedded hardware via a USB connection. The communication is performed by copying memory chunks to and from the embedded hardware's memory.
On the board's memory there is a memory region which I use as a mailbox where I write and read the data. The mailbox is defined as a structure and I use the same definition to allocate a mirror mailbox in my host space.
I used this technique successfully in the past so now I copied the host Eclipse project to my current project's workspace, and made the appropriate name changes. The strange thing is that when building the host project I now get the following message:
Building target: fft2d_host
Invoking: GCC C Linker
gcc -L/opt/adapteva/esdk/tools/host/x86_64/lib -o "fft2d_host" ./src/fft2d_host.o -le_host -lrt
./src/fft2d_host.o: In function `main':
fft2d_host.c:(.text+0x280): relocation truncated to fit: R_X86_64_PC32 against symbol `Mailbox' defined in COMMON section in ./src/fft2d_host.o
What does this error mean and why it won't build on the current project, while it is OK with the older project?
You are attempting to link your project in such a way that the target of a relative addressing scheme is further away than can be supported with the 32-bit displacement of the chosen relative addressing mode. This could be because the current project is larger, because it is linking object files in a different order, or because there's an unnecessarily expansive mapping scheme in play.
This question is a perfect example of why it's often productive to do a web search on the generic portion of an error message - you find things like this:
http://www.technovelty.org/code/c/relocation-truncated.html
Which offers some curative suggestions.
Minimal example that generates the error
main.S moves an address into %eax (32-bit).
main.S
_start:
mov $_start, %eax
linker.ld
SECTIONS
{
/* This says where `.text` will go in the executable. */
. = 0x100000000;
.text :
{
*(*)
}
}
Compile on x86-64:
as -o main.o main.S
ld -o main.out -T linker.ld main.o
Outcome of ld:
(.text+0x1): relocation truncated to fit: R_X86_64_32 against `.text'
Keep in mind that:
as puts everything on the .text if no other section is specified
ld uses the .text as the default entry point if ENTRY. Thus _start is the very first byte of .text.
How to fix it: use this linker.ld instead, and subtract 1 from the start:
SECTIONS
{
. = 0xFFFFFFFF;
.text :
{
*(*)
}
}
Notes:
we cannot make _start global in this example with .global _start, otherwise it still fails. I think this happens because global symbols have alignment constraints (0xFFFFFFF0 works). TODO where is that documented in the ELF standard?
the .text segment also has an alignment constraint of p_align == 2M. But our linker is smart enough to place the segment at 0xFFE00000, fill with zeros until 0xFFFFFFFF and set e_entry == 0xFFFFFFFF. This works, but generates an oversized executable.
Tested on Ubuntu 14.04 AMD64, Binutils 2.24.
Explanation
First you must understand what relocation is with a minimal example: https://stackoverflow.com/a/30507725/895245
Next, take a look at objdump -Sr main.o:
0000000000000000 <_start>:
0: b8 00 00 00 00 mov $0x0,%eax
1: R_X86_64_32 .text
If we look into how instructions are encoded in the Intel manual, we see that:
b8 says that this is a mov to %eax
0 is an immediate value to be moved to %eax. Relocation will then modify it to contain the address of _start.
When moving to 32-bit registers, the immediate must also be 32-bit.
But here, the relocation has to modify those 32-bit to put the address of _start into them after linking happens.
0x100000000 does not fit into 32-bit, but 0xFFFFFFFF does. Thus the error.
This error can only happen on relocations that generate truncation, e.g. R_X86_64_32 (8 bytes to 4 bytes), but never on R_X86_64_64.
And there are some types of relocation that require sign extension instead of zero extension as shown here, e.g. R_X86_64_32S. See also: https://stackoverflow.com/a/33289761/895245
R_AARCH64_PREL32
Asked at: How to prevent "main.o:(.eh_frame+0x1c): relocation truncated to fit: R_AARCH64_PREL32 against `.text'" when creating an aarch64 baremetal program?
I ran into this problem while building a program that requires a huge amount of stack space (over 2 GiB). The solution was to add the flag -mcmodel=medium, which is supported by both GCC and Intel compilers.
On Cygwin -mcmodel=medium is already default and doesn't help. To me adding -Wl,--image-base -Wl,0x10000000 to GCC linker did fixed the error.
Often, this error means your program is too large, and often it's too large because it contains one or more very large data objects. For example,
char large_array[1ul << 31];
int other_global;
int main(void) { return other_global; }
will produce a "relocation truncated to fit" error on x86-64/Linux, if compiled in the default mode and without optimization. (If you turn on optimization, it could, at least theoretically, figure out that large_array is unused and/or that other_global is never written, and thus generate code that doesn't trigger the problem.)
What's going on is that, by default, GCC uses its "small code model" on this architecture, in which all of the program's code and statically allocated data must fit into the lowest 2GB of the address space. (The precise upper limit is something like 2GB - 2MB, because the very lowest 2MB of any program's address space is permanently unusable. If you are compiling a shared library or position-independent executable, all of the code and data must still fit into two gigabytes, but they're not nailed to the bottom of the address space anymore.) large_array consumes all of that space by itself, so other_global is assigned an address above the limit, and the code generated for main cannot reach it. You get a cryptic error from the linker, rather than a helpful "large_array is too large" error from the compiler, because in more complex cases the compiler can't know that other_global will be out of reach, so it doesn't even try for the simple cases.
Most of the time, the correct response to getting this error is to refactor your program so that it doesn't need gigantic static arrays and/or gigabytes of machine code. However, if you really have to have them for some reason, you can use the "medium" or "large" code models to lift the limits, at the price of somewhat less efficient code generation. These code models are x86-64-specific; something similar exists for most other architectures, but the exact set of "models" and the associated limits will vary. (On a 32-bit architecture, for instance, you might have a "small" model in which the total amount of code and data was limited to something like 224 bytes.)
Remember to tackle error messages in order. In my case, the error above this one was "undefined reference", and I visually skipped over it to the more interesting "relocation truncated" error. In fact, my problem was an old library that was causing the "undefined reference" message. Once I fixed that, the "relocation truncated" went away also.
I may be wrong, but in my experience there's another possible reason for the error, the root cause being a compiler (or platform) limitation which is easy to reproduce and work around. Next the simplest example
define an array of 1GB with:
char a[1024 x 1024 x 1024];
Result: it works, no warnings. Can use 1073741824 instead of the triple product naturally
Double the previous array:
char a[2 x 1024 x 1024 x 1024];
Result in GCC: "error: size of array 'a' is negative" => That's a hint that the array argument accepted/expected is of type signed int
Based on the previous, cast the argument:
char a[(unsigned)2 x 1024 x 1024 x 1024];
Result: error relocation truncated to fit appears, along with this warning: "integer overflow in expression of type 'int'"
Workaround: use dynamic memory. Function malloc() takes an argument of type size_t which is a typedef of unsigned long long int thus avoiding the limitation
This has been my experience using GCC on Windows. Just my 2 cents.
I encountered the "relocation truncated" error on a MIPS machine. The -mcmodel=medium flag is not available on mips, instead -mxgot may help there.
I ran into the exact same issue. After compiling without the -fexceptions build flag, the file compiled with no issue
I ran into this error on 64 bit Windows when linking a c++ program which called a nasm function. I used nasm for assembly and g++ to compile the c++ and for linking.
In my case this error meant I needed DEFAULT REL at the top of my nasm assembler code.
It's written up in the NASM documentation:
Chapter 11: Writing 64-bit Code (Unix, Win64)
Obvious in retrospect, but it took me days to arrive there, so I decided to post this.
This is a minimal version of the C++ program:
> extern "C" { void matmul(void); }
int main(void) {
matmul();
return 0;
}
This is a minimal version of the nasm program:
; "DEFAULT REL" means we can access data in .bss, .data etc
; because we generate position-independent code in 64-bit "flat" memory model.
; see NASM docs
; Chapter 11: Writing 64-bit Code (Unix, Win64)
;DEFAULT REL
global matmul
section .bss
align 32 ; because we want to move 256 bit packed aligned floats to and from it
saveregs resb 32
section .text
matmul:
push rbp ; prologue
mov rbp,rsp ; aligns the stack pointer
; preserve ymm6 in local variable 'saveregs'
vmovaps [saveregs], ymm6
; restore ymm6 from local variable 'saveregs'
vmovaps ymm6, [saveregs]
mov rsp,rbp ; epilogue
pop rbp ; re-aligns the stack pointer
ret
With DEFAULT REL commented out, I got the error message above:
g++ -std=c++11 -c SO.cpp -o SOcpp.o
\bin\nasm -f win64 SO.asm -o SOnasm.obj
g++ SOcpp.o SOnasm.obj -o SO.exe
SOnasm.obj:SO.asm:(.text+0x9): relocation truncated to fit: IMAGE_REL_AMD64_ADDR32 against `.bss'
SOnasm.obj:SO.asm:(.text+0x12): relocation truncated to fit: IMAGE_REL_AMD64_ADDR32 against `.bss'
collect2.exe: error: ld returned 1 exit status
With GCC, there's a -Wl,--default-image-base-low option that sometimes helps to deal with such errors, e.g. in some MSYS2 / MinGW configurations.

How to get pointers and sizes of variables from the compiler - from outside the compiled code?

I'd like the compiler to output a file containing the pointers to all global variables in the source code it is compiling, and also the sizes of them.
Is this possible? Is there a way to do it in any c compiler?
Something like a map file? That will show where the globals and statics are allocated, but not what they point at. Most compilers (linkers) will output one automatically or with a simple statement. Just search for map file in your documentation.
This information is available in the symbol table of the binary, though it might not mean what you expect it to.
The compiler takes one or more source files, compiles the code to object code, and generates an object file (.o on Unix, .obj on Windows). All variables and functions referenced in the source file are mentioned in the symbol table. Variables and functions that are defined in the source file have specific addresses and sizes, while symbols not defined in the source file are marked as undefined and must be linked later. All symbols are listed relative to a particular section. Common sections are ".text" for executable code, ".bss" for variables that are initialized to zero when the program starts, and ".data" for variables initialized with non-zero values.
The linker takes one or more object files, combines the sections (putting all of code and data from each object file into one big section for code and data), and writes an output file. This output file may be an executable, or it may be a shared library. An executable on disk still doesn't have a pointer for each variable; it still stores the offset from the beginning of the section to the variable.
When an executable is run, the operating system's dynamic loader reads the executable, finds each section, and allocates memory for that section. (It may also set up different permissions on each section -- the ".text" segment is often marked as read-only, and (on processors that support it) data segments are sometimes marked as non-executable.) Only then does a variable get a pointer -- when the code needs to access a particular variable, it adds the address of the beginning of the section to the offset from the beginning of the section to get the pointer.
You can use various tools to investigate each binary's symbol table. The GNU toolchain's objdump (used on Linux) is one such tool.
For a simple C hello-world program:
#include <stdio.h>
const char message[] = "Hello world!\n";
int main(int argc, char ** argv) {
printf(message);
return 0;
}
I compile (but don't link) it on my Linux box:
$ gcc -c hello.c -o hello.o
Now I can look at the symbol table:
$ objdump -t hello.o
hello.o: file format elf32-i386
SYMBOL TABLE:
00000000 l df *ABS* 00000000 hello.c
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .rodata 00000000 .rodata
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
00000000 l d .comment 00000000 .comment
00000000 g O .rodata 0000000e message
00000000 g F .text 0000002b main
00000000 *UND* 00000000 puts
The first column is the address of each symbol, relative to the beginning of the section. Each symbol has various flags, and some of the symbols are used as hints to the rest of the toolchain and the debugger. (If I built with debugging symbols, I'd see many entries devoted to them as well.) My simple program has only one variable:
00000000 g O .rodata 0000000e message
The fifth column tells me the symbol message is size 0xe -- 14 bytes.
While no compiler is required to output this data, most linkers can dump out this information. For example, Microsoft's linker mapfile contains all the public symbols in an executable/dll as well as their address relative to the section (read only, read write, code, zero initialized, etc.) they are put in. Sizes can be derived from that, although it's mainly an approximation.
You can also probably figure out a way to inspect the debugging symbols generated for the executable, as that's exactly what a debugger has to do anyway.
Normally you'd get this from the linker, not the compiler -- the linker is what assigns addresses to things. Most linkers can produce a map file that will contain the addresses of global variables and functions (as well as any other symbols in the executable it creates). It'll be up to you to sort out which are which. All of them I've seen include something to tell you, but the exact format varies with the linker involved.

Resources