I am writing a flashloader for a Cortex M4 device and I'd like to "return" a value for the driving PC application using the breakpoint instruction's immediate value.
While hardcoding an immediate works fine:
__asm("bkpt 0x70");
__asm("bkpt %0" : : "i" (0x70));
as soon as I want to "return" something run-time dependent like
uint8_t status = Flash_EraseAll();
__asm("bkpt %0" : : "i" (status));
The compilation fails with
Error[Ta090]: Immediate operand is not constant
I tried using preprocessor macros with different concatenate setups, but to no avail.
Has anybody got an idea of how I could input the run-time dependent status flags to the __asm() block in IAR as an immediate? Based on what I read here, this is not exactly possible, but there might be a clever hacky way to do it.
P.S.: Yes, as a workaround I could use a switch statement where I list and hardcode every possible state, but that's just ugly and long.
I would push the value on the stack, and then use the bkpt instruction with a defined number, so the debugger can look at the stack for this state.
Something like this (pseudocode):
__asm("push %0" : : "i" (status));
__asm("bkpt %0" : : "i" (0x70));
Of course you shouldn't forget to cleanup the stack afterwards.
Since bkpt is encoded with an immediate only, you can obviously not change that at runtime, as you would have to modify the code.
Based on #Devolus's idea, I ended up with the following:
uint32_t status = Flash_EraseAll();
__asm volatile ("str %0, [sp, #-4]!\n\t" // Push to stack
"bkpt 0x0\n\t" // Halt CPU
"add sp, sp, #4\n\t" // Restore SP
: : "r"(status)); // status as input to __asm()
The assembly instructions tells the compiler to put the status variable to a convenient register "r", and store that register's content under the stack pointer's pre-decremented address, then halt the CPU's execution with an immediate 0.
The driving application would poll the target if it was halted (bkpt hit). If halted, by reading the 16-bit data under the current PC (__asm("bkpt 0x00") -> 0xbe00 -> #imm = 0xbe00 & 0x00ff = 0), the application can make sure that the execution had stopped at the right place. It then would read the 32-bit data under the final SP address to fetch the status of the embedded code's execution.
This way, instead of a static 8-bit code from the bkpt's immediate, one can "report" more stuff to the outside world dynamically (32-bit in this case).
As #PeterCordes highlights, the push and bkpt statements must be in the same inline assembly instruction, otherwise the compiler might decide to insert code between the statements. Also, the SP must be restored to the value before the __asm() as the compiler assumes sole control over SP.
Related
i will write my assumptions (based on my researches) in the question below i assume that there are mistakes in my assemptions outside the question it self:
i'm looking into some code written for ARM:
this function (taken from FreeRTOS port code):
portFORCE_INLINE static uint32_t ulPortRaiseBASEPRI(void)
{
uint32_t ulOriginalBASEPRI, ulNewBASEPRI;
__asm volatile(" mrs %0, basepri \n"
" mov %1, %2 \n"
" msr basepri, %1 \n"
" isb \n"
" dsb \n"
: "=r"(ulOriginalBASEPRI), "=r"(ulNewBASEPRI)
: "i"(configMAX_SYSCALL_INTERRUPT_PRIORITY));
/* This return will not be reached but is necessary to prevent compiler
warnings. */
return ulOriginalBASEPRI;
}
i understand in gcc "=r" is output operand. so we save values from asm to C variable
now the code in my understanding is equivalent to:
ulOriginalBASEPRI = basepri
ulNewBASEPRI = configMAX_SYSCALL_INTERRUPT_PRIORITY
basepri = ulNewBASEPRI
i understand we are returning the original value of BASEPRI so thats the first line. however, i didn't understand why we assign variable ulNewBASEPRI then we use it in MSR instruction..
so I've looked in the ARMV7 instruction set and i saw this:
i assume there is no (MSR immediate) in thumb instruction and "Encoding A1" means its only in Arm instruction mode.
so we have to use =r output operand to let asembler to auto select a register for our variable am i correct?
EDIT: ignore this section because i miscounted colons
: "i"(configMAX_SYSCALL_INTERRUPT_PRIORITY));
from my understanding for assembly template:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
);
isn't "i" just means (immediate) or constant in the assembly?
does this mean the third colon is not only for clobber list?
if that so, isn't it more appropriate to find the constraint "i" in the input operands?
EDIT: ignore this section because i miscounted colons
i understand isb, dsb are memory barrier stuff but i really dont understand the discription of them. what they really do?
what happen if we remove dsb or isb instruction for example.?
so we have to use =r output operand to let assembler to auto select a register for our variable am i correct?
Yes, but it's the compiler that does register allocation. It just fills in the %[operand] in the asm template string as a text substitution and feeds that to the assembler.
Alternatively, you could hard-code a specific register in the asm template string, and use a register-asm local variable to make sure an "=r" constraint picked it. Or use an "=m" memory output operand and str a result into it, and declare a clobber on any registers you used. But those alternatives are obviously terrible compared to just telling the compiler about how your block of asm can produce an output.
I don't understand why the comment says the return statement doesn't run:
/* This return will not be reached but is necessary to prevent compiler
warnings. */
return ulOriginalBASEPRI;
Raising the basepri (ARM docs) to a higher number might allow an interrupt handler to run right away, before later instructions, but if that exception ever returns, execution will eventually reach the C outside the asm statement. That's the whole point of saving the old basepri into a register and having an output operand for it, I assume.
(I had been assuming that "raise" meant higher number = more interrupts allowed. But Ross comments that it will never allow more interrupts; they're "raising the bar" = lower number = fewer interrupts allowed.)
If execution really never comes out the end of your asm, you should tell the compiler about it. There is asm goto, but that needs a list of possible branch targets. The GCC manual says:
GCC assumes that asm execution falls through to the next statement (if this is not the case, consider using the __builtin_unreachable() intrinsic after the asm statement).
Failing to do this might lead to the compiler planning to do something after the asm, and then it never happening even though in the source it's before the asm.
It might be a good idea to use a "memory" clobber to make sure the compiler has memory contents in sync with the C abstract machine. (At least for variables other than locals, which an interrupt handler might access). This is usually desirable around asm barrier instructions like dsb, but it seems here we maybe don't care about being an SMP memory barrier, just about consistent execution after changing basepri? I don't understand why that's necessary, but if you do then worth considering one way or another whether compile-time reordering of memory access around the asm statement is or isn't a problem.
You'd use a third colon-separated section in the asm statement (after the inputs) : "memory"
Without that, compilers might decide to do an assignment after this asm instead of before, leaving a value just in registers.
// actual C source
global_var = 1;
uint32_t oldpri = ulPortRaiseBASEPRI();
global_var = 2;
could optimize (via dead-store elimination) into asm that worked like this
// possible asm
global_var = 2;
uint32_t oldpri = ulPortRaiseBASEPRI();
// or global_var = 2; here *instead* of before the asm
Concerning ARM/Thumb instruction set differences on msr: you should be able to answer this yourself from the documentation. ;-) It is just 2 pages later. Edit: Chapter A8.1.3 of the linked manual clearly states how encodings are documented on instructions.
dsb (data synchronization barrier) makes sure that all memory accesses are finished before the next instruction is executed. This is really shortly written, for the full details you need to read the documentation. If you have further specific questions about this operation, please post another question.
isb (instruction synchronization barrier) purges the instruction pipeline. This pipeline buffers instructions which are already fetched from memory but are not yet executed. So the next instruction will be fetched with possibly changed memory access, and this is what a programmer expects. The note above applies here, too.
I'm trying to integrate my assembly code into c programs to make it easier to access.
I try to run the following code (I'm on an x64 64 bit architecture)
void push(long address) {
__asm__ __volatile__("movq %0, %%rax;"
"push %%rax"::"r"(address));
}
The value of $rsp doesn't seem to change (neither does esp for that matter). Am I missing something obvious about how constraints work? rax is getting correctly allocated with address, but address never seems to get pushed onto the stack?
You can't do that.
Inline asm must document to the compiler the inputs it takes, the outputs it produces, and any other state it clobbers as part of its execution. Yours fails to do so, but perhaps more to the point, there is no way you could possibly be allowed to clobber the stack pointer like you're doing, since the surrounding code, when it regains control after the asm block, would have no way to find any of its data - even if it had saved it on the stack knowing it would be clobbered, it would have no way to get it back.
I'm not sure what you're trying to do, but whatever it is, this is not the way to do it.
I'm going through Micheal Abrash's Graphics Programming Black Book (which by the way, I am really enjoying, I strongly recommend it), so the example code I'm working with is quite old. Nonetheless, I don't see what the problem is:
__asm__(
//Some setup code here
"movl %%esi, %%edi;"
"movw %%ds, %%es;"
//A whole bunch more assembly code in between
"divloop:"
"lodsw;"
"divl %%ebx;"
"stosw;"
"loop divloop;"
//And a little more code here
: "=r" (ret)
: "0" (ret) /*Have to do this for some reason*/, "b" (div), "c" (l), "S" (p)
: "%edi", "%es"
);
The l variable is an unsigned int, the p variable is a char*. l
is a byte count for the length of the string pointed at by p. div
is the divisor and is an unsigned int. ret is the return value (an
unsigned int) of the function and is set inside to assembly block to
be the remainder of the division.
The error message I am getting is "error: unknown register name '%es' in 'asm'" (This is the only error message). My best guess is that it goes by another name in GAS syntax. I know I'm working with old code, but as far as I know, on my fairly new intel i3 there is still an ES register that gets used by stos*
Secondly, there's a question that's been bugging me. I've basically had no choice but to just assume that DS was already set to the right memory location for use with lods*. Since I am reading from, modifying, and writing to the same memory location (using stos* and lods*) I'm setting ES equal to DS. However, it's really scaring me that my DS could be anything and I don't know what else to set it to. What's more is that ESI and EDI are already 32 bit registers and should be enough on their own to access memory.
In my experience, two strange problems at once are usually related and caused by a more fundamental problem (and usually a PEBKAC). However, I'm stumped at this point. Does anyone know what's going on?
Thanks a bunch
P.S. I'm trying to recreate the code from Chapter 9 (Hints My Readers Gave Me, Listing 9.5, page 182) that divides a large number stored in contiguous memory by EBX. There is no other reason for doing this than my own personal growth and amusement.
If you're running in a flat 32-bit protected mode environment (like a Linux or Windows user-mode process), there's no need to set es.
The segment registers are set for you by the OS, and es and ds both allow you to access a flat 32-bit address space.
GCC won't generate code to save/restore segment registers, so it's not surprising that it won't allow you to add them to the clobber list.
I want to write a small assembly routine which uses a temporary register. When I say temporary register I mean it's not an input or output register in the sense of constraints for an asm block. I could just pick any register and then include it in the clobber list, but I thought it would be nicer for the compiler to be able to choose. What is the correct way to handle this? The only suggestion I've found online is to list it as an output register and then not actually use the output.
You can't use the r (general register) constraint in the clobber list. And an input-only register is assumed to be unmodified by the asm statement. The best solution is to specify the temp as an output register, which gives the compiler the option of discarding the 'result', as well as being able to retire the register.
unsigned long tmp; /* register 'word' type. */
__asm__ ("..." : "=r" (tmp), ... : <inputs> : <clobbered>);
You can now refer to the temp register as %0, in this example. Provided that the tmp variable is never used, the compiler can discard the result and continue to (re)use the register.
I'm doing some experimenting and would like to be able to see what is saved on the stack during a system call (the saved state of the user land process). According to http://lxr.linux.no/#linux+v2.6.30.1/arch/x86/kernel/entry_32.S it shows that the various values of registers are saved at those particular offsets to the stack pointer. Here is the code I have been trying to use to examine what is saved on the stack (this is in a custom system call I have created):
asm("movl 0x1C(%esp), %ecx");
asm("movl %%ecx, %0" : "=r" (value));
where value is an unsigned long.
As of right now, this value is not what is expected (it is showing a 0 is saved for the user value of ds).
Am I correctly accessing the offset of the stack pointer?
Another possibility might be could I use a debugger such as GDB to examine the stack contents while in the kernel? I don't have much extensive use with debugging and am not sure of how to debug code inside the kernel. Any help is much appreciated.
No need for inline assembly. The saved state that entry_32.S pushes onto the stack for a syscall is laid out as a struct pt_regs, and you can get a pointer to it like this (you'll need to include <asm/ptrace.h> and/or <asm/processor.h> either directly or indirectly):
struct pt_regs *regs = task_pt_regs(current);
Inline assembly is trickier than it seems. Trying to shortly cover the concerns for GCC:
If it modifies processor registers, it's necessary to put these registers on the clobber list. It's important to note that the clobber list must contain ALL registers that you changed directly (read explicitly) or indirectly (read implicitly);
To reinforce (1), conditional and mathematical operations also change registers, more known as status flags (zero, carry, overflow, etc), so you have to inform it by adding "cc" to the clobber list;
Add "memory" if it modifies different (read random) memory positions;
Add the volatile keyword if it modifies memory that isn't mentioned on the input/output arguments;
Then, your code becomes:
asm("movl 0x1C(%%esp), %0;"
: "=r" (value)
: /* no inputs :) */
/* no modified registers */
);
The output argument isn't required to be on the clobber list because GCC already knows it will be changed.
Alternatively, since all you want is the value of ESP register, you can avoid all the pain doing this:
register int esp asm("esp");
esp += 0x1C;
It might not solve your problem, but it's the way to go. For reference, check this, this and this.
Keep in mind that x86_64 code will often pass values in registers (since it has so many) so nothing will be on the stack. Check the gcc intermediate output (-S IIRC) and look for push in the assembly.
I'm not familiar with debugging kernel code, but gdb is definitely nicer to examine the stack interactively.