inline assembly in lldb expression evaluation? - lldb

I try to use inline assembly in LLDB's expression and it does not seem to work. As a toy example I run on ARM:
(lldb) expr __asm__ __volatile__("mov r0, 4");
(lldb) register read
General Purpose Registers:
r0 = 0x00000003
In reality I need to access the special CP15 c13 Software Thread ID registers and could not find other way of doing it within LLDB - so an idea here will be appreciated. I thought of using
expr __asm__ __volatile__("MRC p15, 0, r0, c13, c0, 3");
It runs, but has no effect on the content of r0.
Thanks a lot!

An expression that writes to a register won't actually change register state. Before running an expression, lldb saves the register state away, and then it runs your expression, then restores the register state when it is done. In almost all cases, you don't want whatever change was made to the registers in order to run some complex expression to get inherited by the program as it continues, or it will just crash.
If you need to change register state explicitly, then you have to use register read.
I know that wasn't the point of your exercise, this comment is more to explain why that approach didn't work.

Related

understanding GCC inline asm function

i will write my assumptions (based on my researches) in the question below i assume that there are mistakes in my assemptions outside the question it self:
i'm looking into some code written for ARM:
this function (taken from FreeRTOS port code):
portFORCE_INLINE static uint32_t ulPortRaiseBASEPRI(void)
{
uint32_t ulOriginalBASEPRI, ulNewBASEPRI;
__asm volatile(" mrs %0, basepri \n"
" mov %1, %2 \n"
" msr basepri, %1 \n"
" isb \n"
" dsb \n"
: "=r"(ulOriginalBASEPRI), "=r"(ulNewBASEPRI)
: "i"(configMAX_SYSCALL_INTERRUPT_PRIORITY));
/* This return will not be reached but is necessary to prevent compiler
warnings. */
return ulOriginalBASEPRI;
}
i understand in gcc "=r" is output operand. so we save values from asm to C variable
now the code in my understanding is equivalent to:
ulOriginalBASEPRI = basepri
ulNewBASEPRI = configMAX_SYSCALL_INTERRUPT_PRIORITY
basepri = ulNewBASEPRI
i understand we are returning the original value of BASEPRI so thats the first line. however, i didn't understand why we assign variable ulNewBASEPRI then we use it in MSR instruction..
so I've looked in the ARMV7 instruction set and i saw this:
i assume there is no (MSR immediate) in thumb instruction and "Encoding A1" means its only in Arm instruction mode.
so we have to use =r output operand to let asembler to auto select a register for our variable am i correct?
EDIT: ignore this section because i miscounted colons
: "i"(configMAX_SYSCALL_INTERRUPT_PRIORITY));
from my understanding for assembly template:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
);
isn't "i" just means (immediate) or constant in the assembly?
does this mean the third colon is not only for clobber list?
if that so, isn't it more appropriate to find the constraint "i" in the input operands?
EDIT: ignore this section because i miscounted colons
i understand isb, dsb are memory barrier stuff but i really dont understand the discription of them. what they really do?
what happen if we remove dsb or isb instruction for example.?
so we have to use =r output operand to let assembler to auto select a register for our variable am i correct?
Yes, but it's the compiler that does register allocation. It just fills in the %[operand] in the asm template string as a text substitution and feeds that to the assembler.
Alternatively, you could hard-code a specific register in the asm template string, and use a register-asm local variable to make sure an "=r" constraint picked it. Or use an "=m" memory output operand and str a result into it, and declare a clobber on any registers you used. But those alternatives are obviously terrible compared to just telling the compiler about how your block of asm can produce an output.
I don't understand why the comment says the return statement doesn't run:
/* This return will not be reached but is necessary to prevent compiler
warnings. */
return ulOriginalBASEPRI;
Raising the basepri (ARM docs) to a higher number might allow an interrupt handler to run right away, before later instructions, but if that exception ever returns, execution will eventually reach the C outside the asm statement. That's the whole point of saving the old basepri into a register and having an output operand for it, I assume.
(I had been assuming that "raise" meant higher number = more interrupts allowed. But Ross comments that it will never allow more interrupts; they're "raising the bar" = lower number = fewer interrupts allowed.)
If execution really never comes out the end of your asm, you should tell the compiler about it. There is asm goto, but that needs a list of possible branch targets. The GCC manual says:
GCC assumes that asm execution falls through to the next statement (if this is not the case, consider using the __builtin_unreachable() intrinsic after the asm statement).
Failing to do this might lead to the compiler planning to do something after the asm, and then it never happening even though in the source it's before the asm.
It might be a good idea to use a "memory" clobber to make sure the compiler has memory contents in sync with the C abstract machine. (At least for variables other than locals, which an interrupt handler might access). This is usually desirable around asm barrier instructions like dsb, but it seems here we maybe don't care about being an SMP memory barrier, just about consistent execution after changing basepri? I don't understand why that's necessary, but if you do then worth considering one way or another whether compile-time reordering of memory access around the asm statement is or isn't a problem.
You'd use a third colon-separated section in the asm statement (after the inputs) : "memory"
Without that, compilers might decide to do an assignment after this asm instead of before, leaving a value just in registers.
// actual C source
global_var = 1;
uint32_t oldpri = ulPortRaiseBASEPRI();
global_var = 2;
could optimize (via dead-store elimination) into asm that worked like this
// possible asm
global_var = 2;
uint32_t oldpri = ulPortRaiseBASEPRI();
// or global_var = 2; here *instead* of before the asm
Concerning ARM/Thumb instruction set differences on msr: you should be able to answer this yourself from the documentation. ;-) It is just 2 pages later. Edit: Chapter A8.1.3 of the linked manual clearly states how encodings are documented on instructions.
dsb (data synchronization barrier) makes sure that all memory accesses are finished before the next instruction is executed. This is really shortly written, for the full details you need to read the documentation. If you have further specific questions about this operation, please post another question.
isb (instruction synchronization barrier) purges the instruction pipeline. This pipeline buffers instructions which are already fetched from memory but are not yet executed. So the next instruction will be fetched with possibly changed memory access, and this is what a programmer expects. The note above applies here, too.

IAR assembler BKPT immediate as input operand

I am writing a flashloader for a Cortex M4 device and I'd like to "return" a value for the driving PC application using the breakpoint instruction's immediate value.
While hardcoding an immediate works fine:
__asm("bkpt 0x70");
__asm("bkpt %0" : : "i" (0x70));
as soon as I want to "return" something run-time dependent like
uint8_t status = Flash_EraseAll();
__asm("bkpt %0" : : "i" (status));
The compilation fails with
Error[Ta090]: Immediate operand is not constant
I tried using preprocessor macros with different concatenate setups, but to no avail.
Has anybody got an idea of how I could input the run-time dependent status flags to the __asm() block in IAR as an immediate? Based on what I read here, this is not exactly possible, but there might be a clever hacky way to do it.
P.S.: Yes, as a workaround I could use a switch statement where I list and hardcode every possible state, but that's just ugly and long.
I would push the value on the stack, and then use the bkpt instruction with a defined number, so the debugger can look at the stack for this state.
Something like this (pseudocode):
__asm("push %0" : : "i" (status));
__asm("bkpt %0" : : "i" (0x70));
Of course you shouldn't forget to cleanup the stack afterwards.
Since bkpt is encoded with an immediate only, you can obviously not change that at runtime, as you would have to modify the code.
Based on #Devolus's idea, I ended up with the following:
uint32_t status = Flash_EraseAll();
__asm volatile ("str %0, [sp, #-4]!\n\t" // Push to stack
"bkpt 0x0\n\t" // Halt CPU
"add sp, sp, #4\n\t" // Restore SP
: : "r"(status)); // status as input to __asm()
The assembly instructions tells the compiler to put the status variable to a convenient register "r", and store that register's content under the stack pointer's pre-decremented address, then halt the CPU's execution with an immediate 0.
The driving application would poll the target if it was halted (bkpt hit). If halted, by reading the 16-bit data under the current PC (__asm("bkpt 0x00") -> 0xbe00 -> #imm = 0xbe00 & 0x00ff = 0), the application can make sure that the execution had stopped at the right place. It then would read the 32-bit data under the final SP address to fetch the status of the embedded code's execution.
This way, instead of a static 8-bit code from the bkpt's immediate, one can "report" more stuff to the outside world dynamically (32-bit in this case).
As #PeterCordes highlights, the push and bkpt statements must be in the same inline assembly instruction, otherwise the compiler might decide to insert code between the statements. Also, the SP must be restored to the value before the __asm() as the compiler assumes sole control over SP.

Very Baisc Arm Assembly Questions(add, compare)

TLDR: What exactly does bx lr do?
I have trouble understanding these two following examples:
*Add Example: *
I understand that the code "add r0, r0, r1" add r1 to r1 and stores it to register 0. What I do not understand is that how the code "bx lr" knows how
to return r0 without explicitly stating r0.
Compare Example:
Same here I understand that the code "BGT r0_Gt" compares if r0 > r1, and if this is true, the code will skip to r0_gt: However, how does bx lr know how to return the correct value?
It is defined by the used ABI; for ARM, this is EABI which states in "5.4 Result Return"
A Fundamental Data Type that is smaller than 4 bytes is zero- or sign-extended to a word and returned in r0.
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf
bx lr doesn't return any register at all, it just passes control over back to the caller (in the address in the lr register), without modifying any other registers than pc.
The caller then knows, based on the calling convention, that on return, the return value will be in the r0 register (depending on the exact type of the return value and the platform's calling convention).
BX simply means branch exchange, it does a branch and can switch modes between arm/thumb if supported for that architecture. LR is a shortcut for register 14 its that simple. branch to the address in r14.
if you look at the bl instruction you see that r14 will be set with the address after the bl instruction, the return address from a function call.
The pair bl something then later bx lr (or mov pc,lr also works if you dont need to change modes and are in arm mode) is how you make function calls in arm.
The processor has very little concept of context (in an abstract sense). It does not know where it came from, what the registers are for, or if it is in a function call/subroutine. The higher level languages and compiler do know this, and use some common standards to make things easier.
A very small number of operations do have a special, well defined purpose. A BL instruction updates both the 'next instruction to execute' (otherwise known as PC or R15), but also magically updates R14 (the link register).
Exceptions (in V7-A) change a few of the banked core registers around, including the register which is usually used to access the stack, and the link register. This means that exceptions can happen without loosing track of everything else that was going on. Cortex M does things differently, and actually uses the stack to help with the banking (setting R14 to a 'magic value' to indicate if the most recent call was an exception or not).
Unless an instruction interacts with specific registers, CPSR specifically, it probably doesn't care about the context. Some operations (related to security) will be restricted so they can only happen in privileged states - this is ultimately used to prevent an operating system from the user applications, but usually these will relate to accessing very specific control registers.

Accessing special CP15 c13 Software Thread ID registers in lldb

I need to access the special CP15 c13 Software Thread ID ARM registers and could not find other way of doing it within LLDB. I thought of using
expr __asm__ __volatile__("MRC p15, 0, r0, c13, c0, 3");
It runs, but has no effect on the content of r0. Thanks a lot!
If you're on iOS, you can use _pthread_self() to get this value.

Inline Assembly Stack Behavior

I'm trying to integrate my assembly code into c programs to make it easier to access.
I try to run the following code (I'm on an x64 64 bit architecture)
void push(long address) {
__asm__ __volatile__("movq %0, %%rax;"
"push %%rax"::"r"(address));
}
The value of $rsp doesn't seem to change (neither does esp for that matter). Am I missing something obvious about how constraints work? rax is getting correctly allocated with address, but address never seems to get pushed onto the stack?
You can't do that.
Inline asm must document to the compiler the inputs it takes, the outputs it produces, and any other state it clobbers as part of its execution. Yours fails to do so, but perhaps more to the point, there is no way you could possibly be allowed to clobber the stack pointer like you're doing, since the surrounding code, when it regains control after the asm block, would have no way to find any of its data - even if it had saved it on the stack knowing it would be clobbered, it would have no way to get it back.
I'm not sure what you're trying to do, but whatever it is, this is not the way to do it.

Resources