I am adapting some linker scripts from the rather new AURIX TriCore MCUs.
There is a command I do not understand at all and the documentation [0] is not really helpful.
Can somebody tell me in principle, what is going on there? What is meant with "global address" and what is meant with "core local address"?
[0] INFINEON TECHNOLOGIES AG: TriCore Development Platform, 2015. - Manual
In AURIX, you have multiple cores.Each core has it's own scratchpad data and program RAM called DSPR and PSPR respectively.
Each of these can be accessed using either of two addresses:
Global Address - This address range would refer to the same memory irrespective of the core on which the code is executed.
Local Address - This address would refer to core specific RAMs and would change depending on the core on which the code is executed.Local address will access the core's local scratchpad RAMs.
For example:
CPU0 DSPR starts at 0x70000000 and has a size of 112kB
CPU1 DSPR starts at 0x60000000 and has a size of 120kB
In the code if you use 0x70000000, it would refer to CPU0 DSPR irrespective of whether the access is from CPU0 or CPU1.
This is called Global Address.
Instead if you use 0xD0000000 in your code, it'll access 0x70000000 if the code is executed from CPU0 and if it's executed from CPU1, it'll access 0x60000000.
This is called Local Address.
Such a facility is provided to make the code portable with respect to CPUs.
For DSPRs, the local address starts at 0xD0000000
For PSPRs, the local address starts at 0xC0000000
Pardon my linguistic skills.I am not a native English speaker.Please comment if further clarifications are needed or something is ambiguous.
Related
I was trying my hands on addr2line to convert a "pc" register value from a kernel oops (example) to a line in the kernel code. I believe that the value of the program counter represents a virtual address.
Now this post on Stack Overflow says that we generally provide an offset to addr2line and not a virtual address. VA can only be used when the address space randomization is turned off. Does this hold true for a kernel as well? I believe it should.
This Embedded Linux Conference talk on slide 14 also makes use of the program counter value to jump to the line in code, but I believe this would work only work when the address space randomization is off. Otherwise, once the virtual memory is initialized, it's possible that the kernel gets relocated randomly. In this case, any virtual address picked from an oops should not make any sense to addr2line. This is all theory. I have 2 questions now:
Is my understanding correct? If not, please correct me.
How do we turn off the address space randomization for a kernel so that the location of a symbol can be predicted?
Yes, your understanding is correct.
You have multiple options:
Completely remove KASLR support by building the kernel with CONFIG_RANDOMIZE_BASE=n Drastic solution, wouldn't recommend if not for developing purposes.
Boot the kernel with the command line argument nokaslr. See here for more info.
Manually compute the offset of the address from the start of the kernel's .text segment. Not that easy, would require knowing the base address beforehand or extrapolating it from the panic info. Definitely doable with some grep + objdump + some more ELF tools probably, but pretty annoying and time consuming.
NB: of course points 1 and 2 require that the kernel is compiled with debugging symbols for addr2line to do its job.
See also: this Linux kernel doc page.
I am currently trying to understand how AT91 and a bare metal application can work together. I'll try to describe what I have:
IAR as development environment
A simple application which I can download via debugger to SRAM and which toggles some LEDs (working!)
Using SAM-BA I can write this application to SRAM and it will start correctly (LEDs are toggling)
My hardware platform is the ATSAMA5D3x-EK
Now I would like this application to first run the AT91 bootstrap to initialize all the low level hardware (like DDR-RAM), then jump to my application and run it. I have not been able to do that yet successfully. I am able to start the pre-built uboot binary though so I assume it's not the copying or jump that are failing but my application is setup incorrectly.
As far as I understand, if I jump to an application (I assume this is some sort of "LDR pc, appstart_address") the operation at address appstart_address gets executed.
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt vectors, whereas the first instruction is usually some sort of "LDR pc, =main". Are these required if my application is copied to RAM and executed from there? I somehow have the feeling that after copying my application to RAM, the address pointers do not match anymore (although they should be relative - is that correct at all?)
So my questions basically boil down to:
What happens after AT91 has initialized the hardware and jumps to my application
Do I need to setup my application in some specific way? Do I need to tell the linker or any other component that it will be relocated to some other memory location (at91 bootstrap copies it to 0x2600 0000 whereas 0x2000 0000 is the start address of DDR).
Does anyone know of a good tutorial which explains exactely this step (the jump from at91 bootstrap to my application)?
One more question which I probably can answer myself:
Is it safe to assume that I will not need to execute the instructions in board_startup.s at the beginning of my application which enable The floating point unit, setup the sys stack pointer and so on. I would say that the hardware itself has already been setup by AT91 Bootstrap and therefore there is no need for such setup.
After thinking about a few things it comes down to this:
Does it make sense to tell the linker that it should link main to address 0x0 (because this is where bootstrap will jump to) - how would I do that?
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt
vectors, whereas the first instruction is usually some sort of "LDR
pc, =main". Are these required if my application is copied to RAM and
executed from there? I somehow have the feeling that after copying my
application to RAM, the address pointers do not match anymore
(although they should be relative - is that correct at all?)
The first 8 WORDS are exception entry points yes. Of which one is undefined so 7 real ones...
The reset vector does not want to go straight to main implying C code, you have not setup the stack or anything that you need to do to call C code. Also the reset vector is often close enough to use a branch b instead of a ldr pc, but since you only have one word/instruction to get out of the exception table then it either needs to be a branch or a ldr pc,something.
if your binary is position dependent then you build it for that position, you can then place it in non-volatile storage, copy and run if you like there is no problem with that. if you build it for its non-volatile address but you run it in a different address space and it is not position independent then you are right it simply wont work.
What happens after AT91 has initialized the hardware and jumps to my
application
your application runs
Do I need to setup my application in some specific way? Do I need to
tell the linker or any other component that it will be relocated to
some other memory location (at91 bootstrap copies it to 0x2600 0000
whereas 0x2000 0000 is the start address of DDR).
either build it position independent or link it for the address where it will run.
Does anyone know of a good tutorial which explains exactely this step
(the jump from at91 bootstrap to my application)?
I assume when you say at91 bootstrap (need to use a more correct term) you mean some part specific (at91 is a long lived family of devices) you really mean either some atmel part specific code or IAR part specific code. And the answer to your question is in their examples or documentation. You need to demonstrate what you found, examples, etc before posting a question like that.
Is it safe to assume that I will not need to execute the instructions
in board_startup.s at the beginning of my application which enable The
floating point unit, setup the sys stack pointer and so on. I would
say that the hardware itself has already been setup by AT91 Bootstrap
and therefore there is no need for such setup.
if you are relying on someone elses code to for example setup ddr, then it is probably a safe bet that they setup the stack. fpu, thats another story. But if that file name is specific to their project and is something they call/use then well, they called it or used it. Again this is specific to this magic AT91 Bootstrap thing which you have not demonstrated that you looked at or through or read about. Please, do some more research on the topic, show what you tried, etc. For example it should be quite trivial after this bootstrap code to read the registers that enable the fpu and or just use it and see what you see. that is an easy way to tell if it had been run. alternatively insert an infinite loop in that code and re-build if the code hangs at the infinite loop. they they are running it. (careful not to brick your board with such a move, in theory SAM-BA will let you re-load).
Does it make sense to tell the linker that it should link main to
address 0x0 (because this is where bootstrap will jump to) - how would
I do that?
The exception table for this processor is at a well known location (possibly one of two depending on strapping). the exception handlers need to be in the right place for the processor to boot properly. Generally it is the linker that does the final arranging of code and it is linker specific as to how you tell the linker where to put things so the answer is in the documentation for the linker and also either somewhere in the project it specifies this information (linker script, makefile, etc) or a default is used either global default or some variable or command line option tells one of the tools where to look for this information. so how you do it is read the docs and do what the docs say.
For example in my program I called a function foo(). The compiler and assembler would eventually write jmp someaddr in the binary. I know the concept of virtual memory. The program would think that it has the whole memory at disposal, and the start position is 0x000. In this way the assembler can calculate the position of foo().
But in fact this is not decided until runtime right? I have to run the program to know where I loaded the program into, hence the address of the jmp. But when the program actually runs, how does the OS come in and change the address of the jmp? These are direct CPU instructions right?
This question can't be answered in general because it's totally hardware and OS dependent. However a typical answer is that the initially loaded program can be compiled as you say: Because the VM hardware gives each program its own address space, all addresses can be fixed when the program is linked. No recalculation of addresses at load time is needed.
Things get much more interesting with dynamically loaded libraries because two used by the same initially loaded program might be compiled with the same base address, so their address spaces overlap.
One approach to this problem is to require Position Independent Code in DLLs. In such code all addresses are relative to the code itself. Jumps are usually relative to the PC (though a code segment register can also be used). Data are also relative to some data segment or base register. To choose the runtime location, the PIC code itself needs no change. Only the segment or base register(s) need(s) be set whenever in the prelude of every DLL routine.
PIC tends to be a bit slower than position dependent code because there's additional address arithmetic and the PC and/or base registers can bottleneck the processor's instruction pipeline.
So the other approach is for the loader to rebase the DLL code when necessary to eliminate address space overlaps. For this the DLL must include a table of all the absolute addresses in the code. The loader computes an offset between the assumed code and data base addresses and actual, then traverses the table, adding the offset to each absolute address as the program is copied into VM.
DLLs also have a table of entry points so that the calling program knows where the library procedures start. These must be adjusted as well.
Rebasing is not great for performance either. It slows down loading. Moreover, it defeats sharing of DLL code. You need at least one copy per rebase offset.
For these reasons, DLLs that are part of Windows are deliberately compiled with non-overlapping VM address spaces. This speeds loading and allows sharing. If you ever notice that a 3rd party DLL crunches the disk and loads slowly, while MS DLLs like the C runtime library load quickly, you are seeing the effects of rebasing in Windows.
You can infer more about this topic by reading about object file formats. Here is one example.
Position-independent code is code that you can run from any address. If you have a jmp instruction in position-independent code, it will often be a relative jump, which jumps to an offset from the current location. When you copy the code, it won't change the offsets between parts of the code so it will still work.
Relocatable code is code that you can run from any address, but you might have to modify the code first (maybe you can't just copy it). The code will contain a relocation table which tells how it needs to be modified.
Non-relocatable code is code that must be loaded at a certain address or it will not work.
Each program is different, it depends on how the program was written, or the compiler settings, or other various factors.
Shared libraries are usually compiled as position-independent code, which allows the same library to be loaded at different locations in different processes, without having to load multiple copies into memory. The same copy can be shared between processes, even though it is at a different address in each process.
Executables are often non-relocatable, but they can be position-independent. Virtual memory allows each program to have the entire address space (minus some overhead) to itself, so each executable can choose the address at which it's loaded without worrying about collisions with other executables. Some executables are position-independent, which can be used to increase security (ASLR).
Object files and static libraries are usually relocatable code. The linker will relocate them when combining them to create a shared library, executable, or other image.
Boot loaders and operating system kernels are almost always non-relocatable.
Yes, it is at runtime. The operating system, the part managing starting and switching tasks is ideally at a different protection level, it has more power. It knows what memory is in use and allocates some for the new task. It configures the mmu so that the new task has a virtual address space starting at zero or whatever the rule is for that operating system and processor. How you get into user mode at that starting address, is very processor specific.
One method for example is the hardware might save some state not just address but mode or virtual id or something when an interrupt occurs, lets say on the stack. And the return from interrupt instruction as defined by that processor takes the address, and state/mode, off of the stack and switches there (causing lets assume the mmu to react to its next fetch based on the new mode not the old). For a processor that works like that then you may be able to fake an interrupt return by placing the right items on the stack such that when you kick the interrupt return instruction it basically does a jump with additional features of mode switching, etc.
The ARM family for example (not cortex-m) has a processor state register for what you are running now (in the case of an interrupt or service call) and a second state register for where you came from, the state that was interrupted, when you do the proper return you give it the address and it switches back to that mode using the other register. You can directly access that register from the non-users modes so you can manipulate the state of the return. There is no return instruction in arm, just flavors of jump (modifications to the program counter), so it is a special kind of jump.
The short answer is that it is very specific to the processor as to what your choices are for jumping to the first time or returning to after a task switch to a running task in an application mode in a virtual address space. Either directly or indirectly the processor documentation will describe these modes and how you change them. If not explicitly described then you have to figure out on your own from the instructions and the mmu protections and such how to switch tasks.
How do i find out what memory addresses are suitable for use ?
More specifically, the example of How to use a specific address is here: Pointer to a specific fixed address, but not information on Why this is a valid address for reading/writing.
I would like a way of finding out that addresses x to y are useable.
This is so i can do something similar to memory mapped IO without a specific simulator. (My linked Question relevant so i can use one set of addresses for testing on Ubuntu, and another for the actual software-on-chip)
Ubuntu specific answers please.
You can use whatever memory address malloc() returns. Moreover, you can specify how much memory you need. And with realloc() you even can change your mind afterwards.
You're mixing two independent topics here. The Question that you're linking to, is regarding a micro controller's memory mapped IO. It's referring to the ATM128, a Microcontroller from the Atmel. The OP of that question is trying to write to one of the registers of it, these registers are given specific addresses.
If you're trying to write to the address of a register, you need to understand how memory mapped IO works, you need to read the spec for the chipset/IC your working on. Asking this talking about "Ubuntu specific answers" is meaningless.
Your program running on the Ubuntu OS is running it it's own virtual address space. So asking if addresses x to y are available for use is pretty pointless... unless you're accessing hardware, there's no point in looking for a specific address, just use what the OS gives you and you'll know you're good.
Based on your edit, the fact that you're trying to do a simulation of memory mapped IO, you could do something like:
#ifdef SIMULATION
unsigned int TX_BUF_REG; // The "simulated" 32-bit register
#else
#define TX_BUF_REG 0x123456 // The actual address of the reg you're simulating
#endif
Then use accessor macro's to read or write specific bits via a mask (as is typically done):
#define WRITE_REG_BITS(reg, bits) {reg |= bits;}
...
WRITE_REG_BITS(TX_BUF_REG, SOME_MASK);
Static variables can be used in simulations this way so you don't have to worry about what addresses are "safe" to write to.
For the referenced ATMega128 microcontroller you look in the Datasheet to see which addresses are mapped to registers. On a PC with OS installed you won't have a chance to access hardware registers directly this way. At least not from userspace. Normally only device drivers (ring 0) are allowed to access hardware.
As already mentioned by others you have to use e.g. malloc() to tell the OS that you need a pointer to memory chuck that you are allowed to write to. This is because the OS manages the memory for the whole system.
I am working in obtaining all the data of a program using its ELF and DWARF info and by hooking a pin tool to a process that is currently running -- It is kind of a debugger using a Pin tool.
For getting the local variables from the stack I am working with the registers EIP, EBP and ESP which I have access to from Pin.
What stroke me as weird is that I was expecting EIP to be pointing to the current function that was running when the pin tool was attached to the process, but instead EIP is pointing to the section .PLT. In other words, if the pin tool was hooked into the process when Foo() was running, then I was expecting EIP to be pointing to some address inside the Foo function. However it is pointing to the beginning of the .PLT section.
What I need to know is which function the process is currently in -- Is there any way to get the address of the function using the .PLT section? Is there any other ways to get the address of the function from the stack or using Pin? I hope I was clear enough, let me know if there are any questions though.
I might not be understanding exactly what is going on here...is the instruction pointer really in the .plt section or are you just getting a garbage value from Pin ?
You name the instruction pointer you are reading EIP, which might be a problem if you are running on a 64bit system, is that the case ?
You see the instruction pointer register is a 32bit value on a 32bit system, and a 64bit value on a 64bit system. So Pin actually provides 3 REG_* names for the instruction pointer: EIP, RIP and GBP. EIP is always the lower 32bit half of the register, RIP the 64bit value, and GBP one of the two depending on your architecture. Asking for EIP on a 64bit system gives you garbage, same for asking RIP on a 32bit one.
Otherwise, a quick look on Google gives me this. Quoting a bit:
By default the .plt entries are all initialized by the linker not to point to the correct target functions, but instead to point to the dynamic loader itself. Thus, the first time you call any given function, the dynamic loader looks up the function and fixes the target of the .plt so that the next time this .plt slot is used we call the correct function.
And more importantly:
It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example.
Hope that helps.