I have a need in pure C, after make the page read, I want to replace the function address with jump instruction and another function address, so I can use another function instead of current function at runtime, which implements MOCK.
It works fine on X86, but on ARM, I came into some issues, and do not know how to solve it. could you help me?
What is jump instruction of ARM, and how to replace it with current function address using memcpy?
I think maybe the key element is 16hex ARM jump instruction
From blog post titled Caches and Self-Modifying Code on arm's community page:
Cached ARM architectures have a separate cache for data and
instruction accesses; these are called the D-cache and the I-cache,
respectively. ... with two interfaces to the CPU,
the core can load an instruction and some data at the same time.
... because the D-cache and I-cache are not coherent, the
newly-written instructions might be masked by the existing contents of
the I-cache, causing the processor to execute old (or possibly
invalid) instructions.
I believe rest of the article would help you dig deeper however I wonder why you are not using function pointers? They would be much easier to build on.
Related
I am using an EK-LM4F120XL board, which contains a cortex-M4 processor. I also use GCC-ARM-none-eabi as toolchain.
I am building on a little hobby project, which slowly becomes an operating system. An important part of this is that I need to switch out registers to switch processes. This happens inside an interrupt and this specific processor makes sure that all the temporary registers (r0-r3, r12, lr) are pushed to the process stack. So in order to continue I need to write the content of r4-r11 and the SP to a place in memory, I need to load the r4-r11 of the new process, load its stackpointer and return. Additionally the lr value contains some information about the process that was interrupted, so I need information from that register too.
All of this works, because I wrote it in assembly. I linked the assembly function directly to the interrupt, so I have full control over what happens to the registers. The combination of C and inline assembly did not work because the compiler usually pushes some registers to the stack and that is fatal. But the OS is growing and the context change is growing along: there are now also some global variables that need changing, etc. All of this is doable in assembly, but its becoming a pain: assembly is hard to read and to debug. So I want a C/Assemlby combo. Basically I am looking for something like this:
void contextSwitch(void){
//Figure out what the next process will be
//Change every variable that needs changing
// Restore register state to the moment of interrupt. The following function will not return in the sense that it will end the interrupt.
swapRegisters(oldProc, newProc);
}
And then write only swapRegisters in assembly. Is there a way to achieve this? Is my solution even the best solution?
There is no portable method of directly accessing CPU registers in C; you will need assembler, in-line assembler, compiler intrinsics or a kernel library (that uses assembler code).
The details of how that is done for Cortex-M are well covered elsewhere and probably too complex to be repeated here: The specifics of doing this in Cortex-M4(F) are described at the ARM Info Center site here. The approach is broadly similar for the Cortex-M3 except for the FPU considerations, an M3 specific description of context switching is provided in this Embedded.com article.
As you can never have enough explanations because different authors make some things clearer than others or give better or more directly applicable examples, here's another - also M3 based, but will work on M4 if not using the FPU or for M4's without an FPU. And yet another example.
I am currently trying to understand how AT91 and a bare metal application can work together. I'll try to describe what I have:
IAR as development environment
A simple application which I can download via debugger to SRAM and which toggles some LEDs (working!)
Using SAM-BA I can write this application to SRAM and it will start correctly (LEDs are toggling)
My hardware platform is the ATSAMA5D3x-EK
Now I would like this application to first run the AT91 bootstrap to initialize all the low level hardware (like DDR-RAM), then jump to my application and run it. I have not been able to do that yet successfully. I am able to start the pre-built uboot binary though so I assume it's not the copying or jump that are failing but my application is setup incorrectly.
As far as I understand, if I jump to an application (I assume this is some sort of "LDR pc, appstart_address") the operation at address appstart_address gets executed.
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt vectors, whereas the first instruction is usually some sort of "LDR pc, =main". Are these required if my application is copied to RAM and executed from there? I somehow have the feeling that after copying my application to RAM, the address pointers do not match anymore (although they should be relative - is that correct at all?)
So my questions basically boil down to:
What happens after AT91 has initialized the hardware and jumps to my application
Do I need to setup my application in some specific way? Do I need to tell the linker or any other component that it will be relocated to some other memory location (at91 bootstrap copies it to 0x2600 0000 whereas 0x2000 0000 is the start address of DDR).
Does anyone know of a good tutorial which explains exactely this step (the jump from at91 bootstrap to my application)?
One more question which I probably can answer myself:
Is it safe to assume that I will not need to execute the instructions in board_startup.s at the beginning of my application which enable The floating point unit, setup the sys stack pointer and so on. I would say that the hardware itself has already been setup by AT91 Bootstrap and therefore there is no need for such setup.
After thinking about a few things it comes down to this:
Does it make sense to tell the linker that it should link main to address 0x0 (because this is where bootstrap will jump to) - how would I do that?
Now, in ARM the first 7 bytes or so are reserved for abort/interrupt
vectors, whereas the first instruction is usually some sort of "LDR
pc, =main". Are these required if my application is copied to RAM and
executed from there? I somehow have the feeling that after copying my
application to RAM, the address pointers do not match anymore
(although they should be relative - is that correct at all?)
The first 8 WORDS are exception entry points yes. Of which one is undefined so 7 real ones...
The reset vector does not want to go straight to main implying C code, you have not setup the stack or anything that you need to do to call C code. Also the reset vector is often close enough to use a branch b instead of a ldr pc, but since you only have one word/instruction to get out of the exception table then it either needs to be a branch or a ldr pc,something.
if your binary is position dependent then you build it for that position, you can then place it in non-volatile storage, copy and run if you like there is no problem with that. if you build it for its non-volatile address but you run it in a different address space and it is not position independent then you are right it simply wont work.
What happens after AT91 has initialized the hardware and jumps to my
application
your application runs
Do I need to setup my application in some specific way? Do I need to
tell the linker or any other component that it will be relocated to
some other memory location (at91 bootstrap copies it to 0x2600 0000
whereas 0x2000 0000 is the start address of DDR).
either build it position independent or link it for the address where it will run.
Does anyone know of a good tutorial which explains exactely this step
(the jump from at91 bootstrap to my application)?
I assume when you say at91 bootstrap (need to use a more correct term) you mean some part specific (at91 is a long lived family of devices) you really mean either some atmel part specific code or IAR part specific code. And the answer to your question is in their examples or documentation. You need to demonstrate what you found, examples, etc before posting a question like that.
Is it safe to assume that I will not need to execute the instructions
in board_startup.s at the beginning of my application which enable The
floating point unit, setup the sys stack pointer and so on. I would
say that the hardware itself has already been setup by AT91 Bootstrap
and therefore there is no need for such setup.
if you are relying on someone elses code to for example setup ddr, then it is probably a safe bet that they setup the stack. fpu, thats another story. But if that file name is specific to their project and is something they call/use then well, they called it or used it. Again this is specific to this magic AT91 Bootstrap thing which you have not demonstrated that you looked at or through or read about. Please, do some more research on the topic, show what you tried, etc. For example it should be quite trivial after this bootstrap code to read the registers that enable the fpu and or just use it and see what you see. that is an easy way to tell if it had been run. alternatively insert an infinite loop in that code and re-build if the code hangs at the infinite loop. they they are running it. (careful not to brick your board with such a move, in theory SAM-BA will let you re-load).
Does it make sense to tell the linker that it should link main to
address 0x0 (because this is where bootstrap will jump to) - how would
I do that?
The exception table for this processor is at a well known location (possibly one of two depending on strapping). the exception handlers need to be in the right place for the processor to boot properly. Generally it is the linker that does the final arranging of code and it is linker specific as to how you tell the linker where to put things so the answer is in the documentation for the linker and also either somewhere in the project it specifies this information (linker script, makefile, etc) or a default is used either global default or some variable or command line option tells one of the tools where to look for this information. so how you do it is read the docs and do what the docs say.
I am working on this code where, I need to get the instructions executed by a program, given the instruction pointers. Assume for now that I have a mechanism that provides me addresses of the instructions, would it be possible to get the opcode from this (on an IA32 instruction set) ?
You need an in memory disassembler, such as BeaEngine or DiStorm, these can be passed a memory address to read from, just make sure the address is readable. If you know the length in bytes of the function, its a little better to use the Run-Length-Dissassemblers also provided on those sites.
If you are looking for hardware supported help, that's not how it works. This needs to be done in software. Your code needs a table of opcodes and instructions and just has to perform a lookup.
What you describe is known as disassembly. There are many open source disassemblers and if you could use one of those it would make your task very simple. Look here: http://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers
I have a character buffer which i want to place in Cache ,how to make sure that in memory map the compiler places this in DCache.
Compiler is RVCT 3.1
This is called "cache lockdown", and is supported by most (if not all) ARM9 processors, depending on the cache architecture. Here is a useful page from the ARM920T reference manual, including some example code. You should be able to find information for your specific processor in the table of contents on that page.
I never heard of such feature in ARM. There is the PLD instruction that is a hint for preloading some data.
When compiling shared libraries in gcc the -fPIC option compiles the code as position independent. Is there any reason (performance or otherwise) why you would not compile all code position independent?
It adds an indirection. With position independent code you have to load the address of your function and then jump to it. Normally the address of the function is already present in the instruction stream.
Yes there are performance reasons. Some accesses are effectively under another layer of indirection to get the absolute position in memory.
There is also the GOT (Global offset table) which stores offsets of global variables. To me, this just looks like an IAT fixup table, which is classified as position dependent by wikipedia and a few other sources.
http://en.wikipedia.org/wiki/Position_independent_code
In addition to the accepted answer. One thing that hurts PIC code performance a lot is the lack of "IP relative addressing" on x86. With "IP relative addressing" you could ask for data that is X bytes from the current instruction pointer. This would make PIC code a lot simpler.
Jumps and calls, are usually EIP relative, so those don't really pose a problem. However, accessing data will require a little extra trickery. Sometimes, a register will be temporarily reserved as a "base pointer" to data that the code requires. For example, a common technique is to abuse the way calls work on x86:
call label_1
.dd 0xdeadbeef
.dd 0xfeedf00d
.dd 0x11223344
label_1:
pop ebp ; now ebp holds the address of the first dataword
; this works because the call pushes the **next**
; instructions address
; real code follows
mov eax, [ebp + 4] ; for example i'm accessing the '0xfeedf00d' in a PIC way
This and other techniques add a layer of indirection to the data accesses. For example, the GOT (Global offset table) used by gcc compilers.
x86-64 added a "RIP relative" mode which makes things a lot simpler.
Because implementing completely position independent code adds a constraint to the code generator which can prevent the use of faster operations, or add extra steps to preserve that constraint.
This might be an acceptable trade-off to get multiprocessing without a virtual memory system, where you trust processes to not invade each other's memory and might need to load a particular application at any base address.
In many modern systems the performance trade-offs are different, and a relocating loader is often less expensive (it costs any time code is first loaded) than the best an optimizer can do if it has free reign. Also, the availability of virtual address spaces hides most of the motivation for position independence in the first place.
position-independent code has a performance overhead on most architecture, because it requires an extra register.
So, this is for performance purpose.
Also, virtual memory hardware in most modern processors (used by most modern OSes) means that lots of code (all user space apps, barring quirky use of mmap or the like) doesn't need to be position independent. Every program gets its own address space which it thinks starts at zero.
Nowadays operating system and compiler by default make all the code as position independent code. Try compiling without the -fPIC flag, the code will compile fine but you will just get a warning.OS's like windows use a technique called as memory mapping to achieve this.