what does PC have to do with load or link address? - linker

Link address is the address where execution of a program takes place, while load address is the address in memory where the program is actually placed.
Now i'm confused what is the value in program counter? is it the load address or is it the link address?

Link address is the address where execution of a program takes place
No, it's not.
while load address is the address in memory where the program is actually placed.
Kind of. The program usually consists of more than one instruction, so it can't be placed at a single "load address".
When people talk about load address, they usually talk about relocatable code that can be relocated (at runtime) to an arbitrary load address.
For example, let's take a program that is linked at address 0x20020, and consists of 100 4-byte instructions, which all execute sequentially (e.g. it's a sequence of ADDs followed by a single SYSCALL to exit the pogram).
If such a program is loaded at address 0x20020, then at runtime the program counter will have value 0x20020, then it will advance to the next instruction at 0x20024, then to 0x20028, etc. until it reaches the last instruction of the program at 0x201ac.
But if that program is loaded at address 0x80020020 (i.e. if the program is relocated by 0x80000000 from its linked-at address), then the program counter will start at 0x80020020, and the last instruction will be at 0x800201ac.
Note that on many OSes executables are not relocatable and thus have to always be loaded at the same address they were linked at (i.e. with relocation 0; in this case "link address" really is the address where execution starts), while shared libraries are almost always relocatable and are often linked at address 0 and have non-zero relocation.

Both are different concepts, used in different context. The Linker/Loader is mainly responsible for code relocation and modification; the PC is a digital counter which indicates the positioning of program sequence(not a type's address/location like linker/loader).
Linking & Loading :-
The heart of a linker or loader's actions is relocation and code
modification. When a compiler or assembler generates an object file,
it generates the code using the unrelocated addresses of code and data
defined within the file, and usually zeros for code and data defined
elsewhere. As part of the linking process, the linker modifies the
object code to reflect the actual addresses assigned. For example,
consider this snippet of x86 code that moves the contents of variable
a to variable b using the eax register.
mov a,%eax
mov %eax,b
If a is defined in the same file at location 1234 hex and b is
imported from somewhere else, the generated object code will be:
A1 34 12 00 00 mov a,%eax
A3 00 00 00 00 mov %eax,b
Each instruction contains a one-byte operation code followed by a
four-byte address. The first instruction has a reference to 1234 (byte
reversed, since the x86 uses a right to left byte order) and the
second a reference to zero since the location of b is unknown.
Now assume that the linker links this code so that the section in
which a is located is relocated by hex 10000 bytes, and b turns out to
be at hex 9A12. The linker modifies the code to be:
A1 34 12 01 00 mov a,%eax
A3 12 9A 00 00 mov %eax,b
That is, it adds 10000 to the address in the first instruction so now
it refers to a's relocated address which is 11234, and it patches in
the address for b. These adjustments affect instructions, but any
pointers in the data part of an object file have to be adjusted as
well.
Program Counter :-
The program counter (PC) is a processor register that indicates where
a computer is in its program sequence.
In a typical central processing unit (CPU), the PC is a digital
counter (which is the origin of the term "program counter") that may
be one of many registers in the CPU hardware. The instruction cycle
begins with a fetch, in which the CPU places the value of the PC on
the address bus to send it to the memory.
The memory responds by
sending the contents of that memory location on the data bus. (This is
the stored-program computer model, in which executable instructions
are stored alongside ordinary data in memory, and handled identically
by it).
Following the fetch, the CPU proceeds to execution, taking
some action based on the memory contents that it obtained. At some
point in this cycle, the PC will be modified so that the next
instruction executed is a different one (typically, incremented so
that the next instruction is the one starting at the memory address
immediately following the last memory location of the current
instruction).

I would put the term "load address" out of your thinking. It does not really exist in a modern operating system. In ye old days of multiple programs loaded into the same address space (and each program loaded into a contiguous region of memory), load address had significance. Now it does not. He's why.
An executable file is typically going to define a number of different program segments. These may not be loaded contiguously in memory. For example, the linker often directs the creation of stack areas remote from other areas of the program.
The executable will indicate the location that should be the initial value of the PC. This might not be at the start of a program segment, let alone be in the first program segment.

Related

Jump to specific address in loaded shared object file (loaded by the LD_PRELOAD)

Suppose we have two programs. The program test is in binary form (ELF file), and the second program is a shred object file, say f.so.
I want to hook some instruction in the program test and move execution to particular instructions located in f.so file.? I do not want to make a function call. For example, I want to hook an instruction (binary instruction) in the test program and remove its 4 byte (ARM 32-bit arch) and write a new branch instruction that points to an instruction located in f.so shared object program.!!
is that possible using LD_PRELOAD?
Edit 1
Based on the helpful information provided (thanks in advance), I performed some experiments and I will explain that in some details (Sorry to provide pics not text) ...
Before instrumenting the target binary file, i.e, test, as this task maybe not easy particularly when we want to overwrite some bytes with new branch instruction. I performed some experiments to hook the control flow of the test program and to see if it is possible to move the execution to some place in f.so file. To this end and for simplicity, I used gdb as it is easy to modify the program counter register to points to other place.
In the following, I performed my test on binary files compiled for x86 architecture not for ARM, however, same task can be ported to ARM binaries.
The following image shows the core dump (disassembly) of (on left) shared object file f.so and (on right side) the target program, test.
My first question is why is there a difference in the memory addresses where the program test and f.so were loaded?
Then, I moved to GDB environment to patch a particular instruction in test binary and move the control (execution) to an arbitrary location in f.so. I used the following command and tested the code to ensure it run correctly by LD_PRELOAD in GDB environment
(gdb) set environment LD_PRELOAD=./f.so
(gdb) file ./test
Reading symbols from ./test...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/null/Desktop/Edit_elf/test
hello
19
[Inferior 1 (process 5827) exited normally]
(gdb)
Ok, now I set a breakpoint at main function in test binary to check weather our f.so is really loaded. After breakpoint I disassembled the function func1 in f.so ... it looks like below:
As we can see, the f.so is loaded in some place in memory (I really don't know) ... these addresses is same as presented in image 1 with a prefix =b7fd2 (for example in image 1 the address of first instruction in f.so is 520, and the address of the same instruction in image 2 is b7fd2520). I don't know precisely the reason for that but I think it is due to virtual memory mapping and related things.
I listed the value of registers for our test program (it is stopped at predefined breakpoint) and changed the value of program counter (eip in x86) to move the execution after breakpoint to some place in f.so.
Then I let the test program to continue its execution. Now it is expected that the CPU will move the execution to the place where eip points to, i.e., 0xb7fd2526.
Wow ... as we can see from the above image, we are able to move the execution from test to some place in shared library loaded by LD_PRELOAD. However...
Why there is a difference in address map of test and f.so? In my final goal, I want to patch the test binary and remove some bytes and overwrite new bytes (machine code) to inject a branch machine instruction to point to shared object file. Therefore, I need to calculate the target address of branch instruction correctly to embedded it with written bytes (machine code) and the target address AFAIK is calculated by considering the address of current instruction with some constant value. So I feel it is hard to create a machine code for a branch instruction to move the control or execution from the memory space (like 804843b) to (b7fd2526) ...
is that possible using LD_PRELOAD?
Sure: the LD_PRELOADed library will have its initializers run at load time.
One of these initializers can binary-patch instructions inside the main executable (by doing mprotect(..., PROT_WRITE) + overwrite instruction bytes + mprotect(..., PROT_EXEC)).
If the main binary is non-PIE, you could hard-code the address to patch directly into the library.
If it is a PIE binary, you'll have to find where in memory it was loaded and adjust the offset accordingly.

How does the linker generate final virtual memory addresses?

Assume this simple code:
int main(){return 0;}
using objdump we can see the memory addresses:
0000000100003fa0 _main:
100003fa0: 55 pushq %rbp
100003fa1: 48 89 e5 movq %rsp, %rbp
100003fa4: 31 c0 xorl %eax, %eax
100003fa6: c7 45 fc 00 00 00 00 movl $0, -4(%rbp)
100003fad: 5d popq %rbp
100003fae: c3 retq
I know that 0x100003fa0 (as an example) is a virtual memory address.
The OS will map it to the physical memory when my program is loaded.
2 Questions:
1- Can the initial address of main be random? as they are virtual I'm guessing it can be
any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?
2- How does the linker come up with the initial address? (again is the starting address random?)
Can the initial address of main be random? as they are virtual I'm guessing it can be any value as the virtual memory will take care of the rest? i.e I can start literally from 0x1 (not 0x0 as it's reserved for null)?
The memory being virtual doesn’t mean that all of the virtual address space is yours to do with as you please. On most OSes, the executable modules (programs and libraries) need to use a subset of the address space or the loader will refuse to load them. That is highly platform-dependent of course.
So the address can be whatever you want as long as it is within the platform-specific range. I doubt that any platform would allow 0x1, not only because some platforms need the code to be aligned to something larger than a byte.
Furthermore, on many platforms the addresses are merely hints: if they can be used as-is, the loader doesn't have to relocate a given section in the binary. Otherwise, it'll move it to a block of the address space that is available. This is fairly common, e.g. on Windows, the 32-bit binaries (e.g. DLLs) have base addresses: if available, the loader can load the binary faster. So, in the hypothetical case of the "initial address" being 0x1, assuming that alignment wasn't a problem, the address will just end up being moved elsewhere in the address space.
It's also worth noting that the "initial address" is a bit of an unspecific term. The binary modules that are loaded when an executable starts, consist of something akin to sections. Each of the sections has its own base address, and possibly also internal (relative) addresses or address references that are tabulated. In addition, one or more of the executable sections will also have an "entry" address. Those addresses will be used by the loader to execute initialization code (e.g. DllMain concept on Windows) - that code always returns quickly. Eventually, one of the sections, that nothing else depends on, will have a suitably named entry point and will be the "actual" program you wrote - that one will keep running and return only when the program has been exited. At that point the control may return to the loader, which will note that nothing else is to be executed, and the process will be torn down. The details of all this are highly platform dependent - I'm only giving a high-level overview, it's not literally done that way on any particular platform.
How does the linker come up with the initial address? (again is the starting address random?)
The linker has no idea what to do by itself. When you link your program, the linker gets fed several more files that come with the platform itself. Those files are linker scripts and various static libraries needed to make the code able to start up. The linker scripts give the linker the constraints in which it can assign addresses. So it’s all highly platform-specific again. The linker can either assign the addresses in a completely deterministic fashion, ie. the same inputs produces identical output always, or it can be told to assign certain kinds of addresses at random (in a non-overlapping fashion of course). That’s known as ASLR (address space randomization).
Not sure about Visual C but gcc (or rather ld) uses a linker script to determine final addresses. This can be specified using the -T option. Full details of gcc linker scripts can be found at: https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts.
Normally you don't need to play with this since your toolchain will be built either for the host machine or when cross compiling with the correct settings for a target.
For ASLR, and .so files you will need to compile with the -PIC or -PIE (position independent code or position independent executable). You compiled code will only contain offsets against some base address in memory. The base address then gets set by the operating system loader before running your application.
Those addresses are base addresses and offsets. ELF file contains special information on how to calculate the actual addresses when the program is loaded. It is a rather advanced topic but how the .elf file is loaded and executed you can read here: How do I load and execute an ELF binary executable manually? or https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/

How can I deduce whether the address at hand is part of an array, from a trace?

This question is concerning low level programming and the solutions for C,C++ etc. does not apply here
I know that if you go through the ELF file you can see global arrays as chunky symbols. How does the processor know that the address is part of an array ? As all it sees are addresses with no specific metadata. Or is there some way this information is passed down to lower levels ?
Eg:
55: 0000000000201078 0 NOTYPE GLOBAL DEFAULT 24 _end
56: 0000000000000540 43 FUNC GLOBAL DEFAULT 14 _start
57: 0000000000201070 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
58: 000000000000064a 49 FUNC GLOBAL DEFAULT 14 main
59: 0000000000201020 80 OBJECT GLOBAL DEFAULT 23 test_Var
60: 0000000000201070 0 OBJECT GLOBAL HIDDEN 23 __TMC_END__
The above section is a readelf -s output. Since I wrote the code I know line 59 is referring to the global array test_Var. But how will I know some data in an instruction I encounter while simulating this code say on Gem5 or running a PIN tool, is part of this array. Not to mention I cannot even see local arrays at this stage. So the question is if I have an instruction trace, or even a data trace of this program, how can I know this particular array is involved at a given instance ?
All the metadata the processor needs is encoded within the machine code instructions that it executes. And if the processor needs the same information at different times for different parts of the program, the compiler will repeat any necessary metadata in the instructions of all such parts of the program.
The kind of metadata the processor needs is: how big is the item (e.g. byte, half, word, quad), is an item signed or unsigned, how far to skip ahead for each index position of an array, etc.. And generally speaking, the processor requires instruction sequences to get any high level language code done, so some metadata is effectively encoded within individual instructions as well as by the sequences of instructions themselves.
An example here is an array that has a particular data type, and of course, is used (e.g. indexed) read & written in different parts of the program. The C program encodes type information (metadata) within the C array declaration, and this type holds no matter what function is accessing the array. However, the processor does not read data declarations, only machine code instructions! So, the translation repeats the size and access pattern information with machine code instructions and instruction sequences as needed, and thus, the compiler ensures consistent access by the processor.
It doesn't; this is the point of undefined behaviour in C / C++: compiler assumes the programmer is right and just does what the source code says. If that leads to accessing a different object that was nearby in memory, that's the programmer's fault, not the compiler's or the CPUs.
We gain efficiency by not even trying to detect this at runtime.

Big empty space in memory?

Im very new to embedded programming started yesterday actually and Ive noticed something I think is strange. I have a very simple program doing nothing but return 0.
int main() {
return 0;
}
When I run this in IAR Embedded Workbench I have a memory view showing me the programs memory. Ive noticed that in the memory there is some memory but then it is a big block of empty space and then there is memory again (I suck at explaining :P so here is an image of the memory)
Please help me understand this a little more than I do now. I dont really know what to search for because Im so new to this.
The first two lines are the 8 interrupt vectors, expressed as 32-bit instructions with the highest byte last. That is, read them in groups of 4 bytes, with the highest byte last, and then convert to an instruction via the usual method. The first few vectors, including the reset at memory location 0, turn out to be LDR instructions, which load an immediate address into the PC register. This causes the processor to jump to that address. (The reset vector is also the first instruction to run when the device is switched on.)
You can see the structure of an LDR instruction here, or at many other places via an internet search. If we write the reset vector 18 f0 95 e5 as e5 95 f0 18, then we see that the PC register is loaded with the address located at an offset of 0x20.
So the next two lines are memory locations referred to by instructions in the first two lines. The reset vector sends the PC to 0x00000080, which is where the C runtime of your program starts. (The other vectors send the PC to 0x00000170 near the end of your program. What this instruction is is left to the reader.)
Typically, the C runtime is code added to the front of your program that loads the global variables into RAM from flash, and sets the uninitialized RAM to 0. Your program starts after that.
Your original question was: why have such a big gap of unused flash? The answer is that flash memory is not really at a premium, so we can waste a little, and that having extra space there allows for forward-compatibility. If we need to increase the vector table size, then we don't need to move the code around. In fact, this interrupt model has been changed in the new ARM Cortex processors anyway.
Physical (not virtual) memory addresses map to physical circuits. The lowest addresses often map to registers, not RAM arrays. In the interest of consistency, a given address usually maps to the same functionality on different processors of the same family, and missing functionality appears as a small hole in the address mapping.
Furthermore, RAM is assigned to a contiguous address range, after all the I/O registers and housekeeping functions. This produces a big hole between all the registers and the RAM.
Alternately, as #Martin suggests, it may represent uninitialized and read-only Flash memory as -- bytes. Unlike truly unassigned addresses, access to this is unlikely to produce an exception, and you might even be able to make them "reappear" using appropriate Flash controller commands.
On a modern desktop-class machine, virtual memory hides all this from you, and even parts of the physical address map may be configurable. Many embedded-class processors allow configuration to the extent of specifying the location of the interrupt vector table.
UncleO is right but here is some additional information.
The project's linker command file (*.icf for IAR EW) determines where sections are located in memory. (Look under Project->Options->Linker->Config to identify your linker configuration file.) If you view the linker command file with a text editor you may be able to identify where it locates a section named .intvec (or similar) at address 0x00000000. And then it may locate another section (maybe .text) at address 0x00000080.
You can also see these memory sections identified in the .map file, along with their locations. (Ensure "Generate linker map file" is checked under Project->Options->Linker->List.) The map file is an output from the build, however, and it's the linker command file that determines the locations.
So that space in memory is there because the linker command file instructed it to be that way. I'm not sure whether that space is necessary but it's certainly not a problem. You might be able to experiment with the linker command file and move that second section around. But the exception table (a.k.a. interrupt vector table) must be located at 0x00000000. And you'll want to ensure that the reset vector points to the new location of the startup code if you move it.

Where does ARM read program instructions from after Register 14?

How I understand the basic workings of the ARM architecture is such;
There are 15 main registers with the 15th (r15) being the Program Counter (PC).
If the program counter points to a specific register, then how can you have a Program which runs more than ~14 lines?
Obviously this is not true, but I don't understand how you can incorporate a big program with just 15 registers? What am I missing?
The program counter points to memory, not another register.
Registers don't store the program code. Program code is in main memory, and the Program Counter points to the location in memory of the next instruction.
The other registers are high-speed locations for storing temporary, or frequently accessed, values during the processing of the application.
In the simplest form, you have Program (Instruction memory), Data memory, Stack Memory, and Registers.
ARM instructions are stored in the Instruction memory, they are a sequence of commands which tell the processor what to do. They are never stored in the registers of the processor. The program counter only points to that instruction, that instruction is simply a command which in the basic form has an opcode (operation code) and variables/literals ..
So what happens is that the instruction is read from memory (fetched) from the location pointed to by the program counter. It is not loaded into the registers, but the control unit where it is decoded (that is to know what operation to do, i.e. add, sub, mov etc) and where to read/store its inputs and outputs.
So where are the inputs/outputs to operate on and store? The ARM architecture is a load/store architecture, it means it operates on data loaded into its registers, that is R1, R2 .. R7 ..etc .. where the registers could be thought of as temporary variables where all inputs and outputs are stored. Registers are used because they are so fast and operate at the same speed of the processor frequency not as memory which is slower.
Now the question is, how to populate these registers with values in the first place?
Those values could be stored on the Data Memory or Stack memory, so there are instructions to copy them to these registers, followed by instructions to operate on them and store the value in the registers, then followed by other instructions to copy the result back to Memory. Some instructions could also load a register with a constant.
Instruction 1 // Copy Variable X into R1 from memory
Instruction 2 // Copy Variable Y into R2 from memory
ADD, R3, R1, R2 // add them together
Instruction 3 // Copy back the result into Memory
I tried to make it as simple as possible, there are so many details to cover. Needs books :)

Resources