How to load a program in memory at a different address than it is intended for? - c

Generally the user program binaries will be loaded in low address (usually around 0x400000) in the programs address space which will be specified in the elf binary (in the case of linux).
Can we force a user binary to load at a high address, possibly within the 2GB range of addresses where libc or other such libraries are loaded?
I have tried finding a solution on the net but could not find any concrete solution for this.
(I am working on Ubuntu 12.10 64bit OS)
Thanks

Unless the binary is position-independent (PIE), this is not possible. Normal (non-PIE) binaries are hard-coded for a particular load address at link time, and during linking, the information necessary for relocating to a different address was already lost.
Edit: The above is assuming you're working with an existing binary. If you are producing the binary yourself, you can control the load address that's hard-coded into it with the following link options:
-Wl,-Ttext-segment,0x80000000
replacing 0x80000000 by your desired address. Certain addresses (such as those reserved for kernel use, typically beginning at 0xc0000000) will not work, and the address must be page-aligned (the last 3 hex digits must be 0).

Related

What does SEGMENT_START("text-segment", 0x400000) represent?

I'm learning about the layout of executable binaries. My end goal is to analyze a specific executable for things that could be refactored (in its source) to reduce the compiled output size.
I've been using https://www.embeddedrelated.com/showarticle/900.php and https://www.geeksforgeeks.org/memory-layout-of-c-program/ as references for this initial learning.
From what I've learned, a linker script specifies the addresses where sections of compiled binaries are placed. E.g.
> ld --verbose | grep text
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
*(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)
I think this means that the text segments of compiled binaries starts at memory address 0x400000 - true?
What does that value, 0x400000, represent? I'm probably not understanding something properly, but surely that 0x400000 does not represent a physical memory location, does it? E.g. if I were to run two instances of my compiled a.out executable in parallel, they couldn't both simultaneously occupy the space at 0x400000, right?
0x4000000 is not a physical address in the sense how your memory chips see it. This is a virtual address as it's seen from CPU's point of view.
Loader of your program will map a few pages of physical memory to VA 0x400000 and copy the contents of text-segment to it. And yes, another instance of your program could occupy the same physical and virtual block of memory for the text-segment, because text (code) is readable and executable but not writeable. Other segments (data, bss, stack, heap) may have identical VA but each will be mapped to their private protected physical block of memory.
What is 0x400000
I think this means that the text segments of compiled binaries starts at memory address 0x400000 - true?
No, this is well explained in the official documentation at: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html
SEGMENT_START(segment, default)
Return the base address of the named segment. If an explicit value has already been given for this segment (with a command-line ‘-T’ option) then that value will be returned otherwise the value will be default. At present, the ‘-T’ command-line option can only be used to set the base address for the “text”, “data”, and “bss” sections, but you can use SEGMENT_START with any segment name.
Therefore, SEGMENT_START is not setting the address, but rather it is returning it, and 0x4000000 in your case is just the default if that value was not deterministically set by some CLI mechanism mentioned in the documentation (e.g. -Ttext=0x200 as mentioned in man ld)
Physical vs virtual
As you've said, doing things in physical addresses is very uncommon in userland, and would at the very least always require sudo as it would break process separation. Here is an example of userland doing physical address stuff for example: How to access physical addresses from user space in Linux?
Therefore, when the kernel loads an ELF binary with the exec syscalls, all addresses are interpreted as virtual addresses.
Note however that this is just a matter of convention. For example, when I give my Linux kernel ELF binary for QEMU to load into memory to start simulation, or when a bootloader does that in a real system, the ELF addresses would then be treated as physical addresses since there is no page table available at that point.

does the starting address of the section in linker script is applicable to only virtual memory

I have read the linker script.
i have got one confusion regarding allocating memory.
when we define section with starting where we want to load the file.
1) does the memory locations what we have specified are applicable to virtual memory like ( . = 0x10000 ).
in your linker script (and the resulting binary), addresses are just addresses.
Whether these are meant virtual or physical solely depends on your loader (which might be a tiny bootloader at early system init that doesn't know about virtual addresses or a full blown OS that provides a sophisticated virtual environment).
So it's the program that brings your binary into memory that decides whether addresses are interpreted virtually or physically, not the linker script.
Unless you tell us about your specific environment, we can't tell you more.

Loading a non-relocatable, static ELF binary in userspace

I'm trying to write a basic userspace ELF loader that should be able to load statically linked (not dynamically linked) non-relocatable binaries (i.e. not built with -pie, -fPIE and so on). It should work on x86 CPU's for now.
I've followed the code on loading ELF file in C in user space and it works well when the executable is relocatable, but as expected completely fails if it isn't since the program is loaded in the wrong virtual memory range and instantly crashes.
But I tried modifying it to load the program at the virtual offset it expects (using phdr.p_vaddr) but I ran into a complication: my loader is already using that virtual memory range! I can't mmap it, much less write anything into it. How do I proceed so that I can load my non-relocatable binary into my loader's address space without overwriting the loader's own code before it's finished? Do I need to get my loader to run from a completely different virtual memory range, perhaps by getting the linker to link it way above the usual virtual memory range for a non-relocatable binary (which happens to start at 0x400000 in my case) or is there some trick to it?
I've read the ELF documentation (I am working with ELF64 here by the way, but I think ELF32 and ELF64 are very similar) and a lot of documents on the web and I still don't get it.
Can someone explain how an ELF loader deals with this particular complication? Thanks!
Archimedes called "heureka" when he found that at a location can only be one object. If your ELF binary must be at one location because you can't rebuild it for another location you have to relocate the loader itself.
The non-relocatable ELF doesn't include enough Information to move it to a different address. You could probably write a decompiler that detects all address references in the code but it's not worth. You will have problems when you try to analyze data references like pointers stored in pre-initialized variables.
Rewrite the loader if you can't get the source code of you ELF binary or a relocatable version.
BTW: Archimedes heureka was deadly for the goldsmith who cheated. I hope it's not so expensive in your case.

Is DLL always have the same Base Address?

I'm studying about windows and DLL stuffs and I have some question about it. :)
I made a simple program that loads my own DLL. This DLL has just simple functions, plus, minus.
This is the question : if I load some DLL (for example, text.dll), is this DLL always have the same Base Address? or it changes when I restart it? and can I hold the DLL's Base Address?
When I test it, it always have same Base Address, but I think when I need to do about this, I have to make some exception about the DLL Base Address.
The operating system will load your DLL in whatever base address it pleases. You can specify a "preferred" base address, but if that does not happen to be available, (for whatever reason, which may well be completely out of your control,) your DLL will be relocated by the operating system to whatever address the operating system sees fit.
i load some DLL(for example, text.dll), is this DLL always have the same Base Address?
No. It is a preferred base address. If something is already loaded at that address, the loader will rebase it and fixup all of the addresses.
Other things, like Address Space Layout Randomization could cause it to be different every time the process starts.
That's a common problem with DLLs that we encountered when trying to decode stacktraces issued by GNAT runtime (Ada).
When presented with a list of addresses (traceback) when our executables crash, we are able to perform addr2line on the given addresses and rebuild the call tree without issues.
On DLLs, this isn't the case (that's why I highly doubt that this issue is ASLR-related, else the executables would have the same random shift), vcsjones answer explains the "why".
Now to workaround this issue, you can write the address of a given symbol (example: the main program) to disk. When analysing a crash, just perform a difference between the address of the symbol in the mapfile and the address written to disk. Apply this difference to your addresses, and you'll be able to compute the theorical addresses, thus the call stack.

load time relocation and virtual memory

I am wondering what load-time relocation actually means on a system with virtual memory support.I was thinking that in a system with virtual memory every executable will have addresses starting from zero and at run-time the addresses will be translated into physical addresses using page tables.Therefore the executable can be loaded anywhere in memory without the need of any relocation. However this article on shared libraries mentions that linker specifies an address in the executable where the executable is to be loaded (Entry-point address).
http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
Also there are many articles on dynamic linking which talk about absolute addresses.
Is my understanding wrong ?
Load-time relocation and virtual memory support are two different concepts. Almost all CPUs and OSes these days have virtual memory support. The only really important point to understand about virtual memory is this: forget physical addresses. That is now a hardware and OS responsibility and, unless you are writing a paging system, you can forget about physical addresses. All addresses that a program uses are virtual addresses. This is a huge advantage and immensely simplifies the programming model. On 32-bit systems, this simply means that each process gets its own 4 GiB memory space, ranging from 0x00000000 to 0xffffffff.
An .exe represents a process. A linker produces .exe from .obj files. While both are binary files, .obj files are not executable because they do not contain the addresses of all the variables and functions. It is the job of the linker to provide these addresses, which it determines by placing these .obj files end-to-end and then computing the exact addresses of all the symbols (functions and variables). Thus, the .exe that is created has every address of functions and variables "hard-coded" into it. But there is still one critical information needed before the .exe can be created. The linker has to have insider knowledge about where in memory the .exe will be loaded. Will it be at address 0x00000000, or at 0xffff0000, or somewhere else? For example, in Windows all .exes are always loaded at an absolute starting address of 0x00400000. This is called the base address. When the linker generates the final addresses of symbols (functions and variables), it computes those from this address onward.
Now, .exes rarely need to be loaded at any other address. But the same is not true for .dlls. .ddls are the same as .exes (both are formatted in the portable executable (PE) file format, which describes the memory layout, for example, where text goes, where data goes, and how to find which one). .dlls have a preferred address, too. This simply means that the linker uses this value when it computes the addresses for symbols inside the .dll. If the .dll is loaded at this address, then we are all set.
But if the .dll cannot be loaded at this address (say it was 0x10000000) because some other .dll had already been loaded at this address, then the loader will find some other space in memory and load the .dll there. However, the global addresses of functions and symbols in the .dll are now incorrect. Thus, the loader has to do a relocation (also called "fixup"), in which it adjusts the addresses of all global symbols and functions to reflect their actual addresses.
In order to do this adjustment, the loader needs to be able to find all such symbols in the .dll. The PE file has a .reloc section that contains the internal offset of all such symbols.
Of course, there are other details, for example, regarding how indirection can be used when the compiler generated the code so that, instead of making direct calls, the calls are indirect and variables are accessed via known memory locations in the header of the .exe.
Finally, the gist is this: You need relocation (of some sort) to adjust addresses in the call and jump as well as variable access instructions when the code does not load at the position (within the 4 GiB address space) it was expected to load. When the OS loads a .exe, it has to pick a suitable place in this 4 GiB address space where it will copy the code and data chunks from this .exe on disk.

Resources