Thread local variables and inline assembly

Thread local variables and inline assembly - c

I am trying to use a thread local variable in inline assembly, but when I see the diassembled code, It appears that the compiler doesn't generate the right code. For the following inline code, where saved_sp is globally declared as __thread long saved_sp,
__asm__ __volatile__ (
"movq %rsp, saved_sp\n\t");
The disassembly looks like the following.
mov %rsp,0x612008
Which is clearly not the right thing, because I know that gcc uses the fs segment for thread local variables. It should had generated something like
mov %rsp, fs:somevalue
which it is not. Why is that so? Is using thread local variables in inline assembly problematic?

A simple thing that would surely work is to take a pointer to the thread local variable, and write to it.
Your compiler will surely do long *saved_fp_p = &saved_fp correctly, and inline assembly will only deal with saved_fp_p, which is a local variable.
You can also use gcc's input and output syntax:
__asm__ __volatile__ (
"mov %%rsp, 0(%0)" : : "r" (&saved_sp)
);
This puts the compiler in charge of resolving the address of saved_fp, and the assembly code gets it in a register.
We found out that this also works,
__asm__ __volatile__ asm ("mov %rsp,%0" : "=m" (saved_sp))

Related

Inserting inline assembly code into C function - I/O questions

I am developing an embedded C application for my Cortex M3 microcontroller using the GNU arm-none-eabi toolchain.
I have plan to adopt an assembly subroutine that the vendor implemented into my C application. I plan to make a new C function, then within that, write an inline assembly block using the extended inline assembly protocol. In this post, I plan to treat this assembly subroutine as a black box, and plan to ask this forum about how to structure the inputs and clobber list; this routine has no outputs.
The assembly subroutine expects r0, r1, and r2 to be pre-set prior to the call. Further, the subroutine uses registers r4, r5, r6, r7, r8, r9 as scratch registers to do its function. It writes to a range of memory on the device, specified by r0 and r1 which are the start and stop addresses, respectively.
So, I am checking if my assumptions are correct. My questions follow.
My function that I think I should write, is this right?:
void my_asm_ported_func(int reg_r0, int reg_r1, int reg_r2 {
__asm__ __volatile__ (
"ldr r0, %0 \n\t",
"ldr r1, %1 \n\t",
"ldr r2, %2 \n\t",
"<vendor code...> ",
: /* no outputs */
: "r" (reg_r0), "r" (reg_r1), "r" (reg_r2) /* inputs */
: "r0", "r1", "r2", "r4", "r5", "r6",
"r7", "r8", "r9", "memory" /* clobbers */
);
}
Since this asm subroutine writes to a range of other memory on the device, is adding "memory" to the clobber list enough? Seems too simple.
Is there a more elegant way to feed in r0 - r2 from the input parameters in the surrounding C function? I understand from AAPCS that the registers r0-r3 are input parameters 1-4, so this seems redundant to feed r0-r2 inputs manually like I did in the input list. Should I somehow just have this be a pure assembly function in a separater .S file?
Thank you in advance.
I tried the above but with the basic inline assembly protocol with terrible results - it crashed. I did it that way because I thought the assembly block would naturally take r0-r2 via the function prologue, which it evidently did because it wrote the memory correctly, but crashed once my breakpoint at the beginning of the asm block was kicked off (my vs code extension doesn't have the step-by-step disassembly view, so it just runs it as a block box and it crashed). I haven't tried the extended yet, I have been doing a lot of reading into this so I just wanted to make sure my black box approach should work and I'm not missing anything too big.

Yes, a volatile asm with a "memory" clobber is fine for MMIO (or pretty much anything that's supported at all): the compiler will make sure the asm it generates has memory contents in sync with the C abstract machine before the asm statement, and will assume that any globally-reachable memory has changed after. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for a more in-depth explanation of why this matters when the pointed-to memory is C variables that you also access outside inline asm, not just MMIO registers.
Registers
To avoid wasted instructions, tell the compiler which registers you want inputs in, or better let the compiler pick and change the "vendor code" to use %0 instead of the hard register r0.
ldr r0, r0 from filling in your ldr r0, %0 template string is either invalid or treats the source r0 as a symbol name. Either way doesn't get the function arg into r0, since you force the compiler to have it in a different register (by declaring a clobber on "r0".) If you did want to copy between registers, the ARM instruction for that is mov. But if that's the first instruction of an asm template string, usually that means you're doing it wrong and should use better constraints to tell the compiler what you want.
// Worse way, but can use a template string with hard-coded registers unchanged
void my_asm_ported_func(int a, int b, int c)
{
register int reg_r0 asm ("r0") = a; // forces "r" to pick r0 for an asm template
register int reg_r1 asm ("r1") = b; // no other *guaranteed* effects.
register int reg_r2 asm ("r2") = c;
__asm__ __volatile__ (
// no extra mov or load instructions
"<vendor code...> " // still unchanged
: "+r" (reg_r0), "+r" (reg_r1), "+r" (reg_r2) // read-write outputs
: // no pure inputs
: "r4", "r5", "r6",
"r7", "r8", "r9", "memory" // clobbers
);
}
Best way
void my_asm_ported_func(int reg_r0, int reg_r1, int reg_r2) {
__asm__ __volatile__ (
// no extra mov or load instructions.
"<vendor code changed to use %0 instead of r0, etc...> "
: "+r" (reg_r0), "+r" (reg_r1), "+r" (reg_r2) // read-write outputs
: // no pure inputs
: "r4", "r5", "r6",
"r7", "r8", "r9", "memory" // clobbers. Not including r3??
);
// the C variables reg_r0 and so on have modified values here
// but they're local to this function so no effect outside of this
}
Actually, a further improvement would be to replace the register clobbers like "r4" through "r9" with "=r"(dummy1) output operands to let the compiler pick which registers to clobber.
I'm surprised the template string doesn't use r3. If it does, you forgot to tell the compiler about it, which is undefined behaviour that will bite you when this function inlines. You mentioned crashes; that could be the cause, if your ldr isn't.
Using %0 instead of r0 in the "vendor code" will get the compiler to fill in the register name it picked. Normally it will pick r0 for the C variable whose value was already there, unless the function inline and the value was in a different register.
I'm assuming the asm template modifies that register, which is why I made it an input/output operand with "+r"(reg_r0), with the output side basically being a dummy to let the compiler know that register changed. You can't declare a clobber on a register that's also an operand, and if you're letting the compiler pick registers you wouldn't even know which one.
If any of the input registers are left unmodified by the asm template, make them pure inputs. You can use [name] "r"(c_var) in the operands and %[name] in the template string to use names instead of numbers, making it easy to move them around without having to renumber and keep track of which operand is which number.
See also
ARM inline asm: exit system call with value read from memory re: getting values into specific ARM registers
https://stackoverflow.com/tags/inline-assembly/info
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html register T foo asm("regname") syntax.
Separate .S file:
Should I somehow just have this be a pure assembly function in a separate .S file?
That's 100% a valid option, especially if call/ret overhead is minor compared to how long this takes, or it's not called all the time.
Look at compiler-generated asm (gcc -S) if you're not sure about the syntax for declaring a function (.globl foo ; foo: to define the symbol, put its machine code after it.) And of course push and pop any call-preserved registers your function uses.
(GNU C inline asm requires you to describe the asm precisely to the compiler; the function-calling convention is irrelevant because it's inline asm. You're dancing with the compiler and need to not step on its toes, instead of just following the standard calling convention.)

Execution of volatile instruction [duplicate]

This question already has answers here:
The difference between asm, asm volatile and clobbering memory
(3 answers)
What does __asm__ __volatile__ do in C?
(3 answers)
Closed 5 years ago.
int __attribute__ ((noinline)) mySystemCall (uint32 Exception, uint32 Parameter)
{
#ifdef PROCESSORX
__asm__ volatile ("sc")
#else
__asm__ __volatile__ ("mov R0, %0; mov R1, %1; svc 0x0 " : : "r" (Exception), "r" (Parameter));
}
How does the compiler translate the instruction (asm volatile ("sc"))?
Why are some arguments passed as strings and some are not (ex:
__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) ))

Inline assembly isn't specified by the C standard. I assume this is code for gcc and compatible, then you should have a look at the manual.
As for your specific questions:
How does the compiler translate the instruction (asm volatile ("sc"))?
The volatile in this context instructs the compiler that the assembler snippet must be included, even if the compiler can't see a reason it's actually needed for the behavior of the program. Whatever comes in the first string parameter is literal assembly code of the target platform.
Why are some arguments passed as strings and some are not
It's just part of the syntax, refer to the manual I listed above. Inline assembly can "bind" input and output parameters to C variables and also tell the compiler which registers are "clobbered" by the assembly snippet (among other things).

Inline assemblers have to bridge the gap between C and assembly so in addition to ones assembly code, one needs to give details of how they interact. The first item in the GCC assembly template is the actual assembly, the other items include assigning input variables, output variables and clobbers (registers/memory) that the assembly may clobber so C need to steer clear of. The full details may be found: here.

inline assembly in avr

void save_context(uint8_t index) {
context *this_context = contextArray + index;
uint8_t *this_stack = this_context->stack;
asm volatile("st %0 r0": "r"(this_stack));
}
I have something like this.
!!! I would like to store the registers r0 r1 r2... into my stack[] array.
What I am programming is the context switch. The context has the structure like this:
typedef struct context_t {
uint8_t stack[THREAD_STACK_SIZE];
void *pstack;
struct context_t *next;
}context;
My problem is that I am not able to pass the c variable "this_stack" to inline assembly. My aim is to store all the registers, stack pointer and SREG on my stack.
After compiling, it gives error:
Description Resource Path Location Type
`,' required 5_multitasking line 754, external location: C:\Users\Jiadong\AppData\Local\Temp\ccDo7xn3.s C/C++ Problem
I looked up the avr inline assembly tutorial. But I don't quite get a lot.
Could anyone help me?

asm volatile ("st %0 r0": "r"(this_stack));
There are several problems in that line: Wrong % print-modifier, missing , between the operands, incorrect constraint and missing description of side effects.
The memory access is supposed to use indirect addressing, so one way is to use indirect+displacement with "b"ase register Y or Z:
asm volatile ("std %a1+0, r0" "\n\t"
"std %a1+1, r1" "\n\t"
"..."
: "+m" (this_context->stack)
: "b" (this_stack));
Notice print modifier %a which prints R30 as Z and not as r30.
Operand 0 is just used to express that this_context->stack is being changed if you don't want the all-memory-clobber "memory". Moreover, there's no need for an intermediate variable for operand 1 because it's not altered: you can use just as well "b" (this_context->stack) for operand 1.
Alternatively, post-increment addressing on "e"xtended (pointer) registers X, Y or Z can be used:
asm volatile ("st %a1+, r0" "\n\t"
"st %a1+, r1" "\n\t"
"..."
: "=m" (this_context->stack), "+e" (this_stack));

"label" makes no sense, that should be a constraint. It also makes no sense trying to save the stack pointer into an array. It might make sense to load the stack pointer with the address of that array, but that's not the save_context.
Anyway, to get the value of SPL which is the stack pointer you can do something like this:
asm volatile("in %0, %1": "=r" (*this_stack) : "I" (_SFR_IO_ADDR(SPL)));
(There is a q constraint but at least my gcc version doesn't like it.)
To get true registers, for example r26 you can do:
register uint8_t r26_value __asm__("r26");
asm volatile("": "=r" (r26_value));

There is a constraint, "m", documented in the GCC manual, but it doesn't always work on AVR. Here is an example of how it should work from sanguino/bootloaders/atmega644p/ATmegaBOOT
asm volatile("...
...
"sts %0,r16 \n\t"
...
: "=m" (SPMCSR) : ... );
I have found "m" to be fragile though. If a function uses a variable in C code, outside of the inline assembly, the compiler may choose to place it in the Z register and it will try to use Z in assembler too. This causes an assembler error when used with the sts instruction. Looking at the assembler output from the C compiler is the best way to debug this kind of problem.
Rather than using an "m" constraint, you can just put the literal address you want into your assembler code. For an example, see pins_teensy.c, where timer_0_fract_count is not included in the :
asm volatile(
...
"sts timer0_fract_count, r24" "\n\t"

GCC INLINE ASSEMBLY Won't Let Me Overwrite $esp

I'm writing code to temporarily use my own stack for experimentation. This worked when I used literal inline assembly. I was hardcoding the variable locations as offsets off of ebp. However, I wanted my code to work without haivng to hard code memory addresses into it, so I've been looking into GCC's EXTENDED INLINE ASSEMBLY. What I have is the following:
volatile intptr_t new_stack_ptr = (intptr_t) MY_STACK_POINTER;
volatile intptr_t old_stack_ptr = 0;
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
The point of this is to first save the stack pointer into the variable old_stack_ptr. Next, the stack pointer (%esp) is overwritten with the address I have saved in new_stack_ptr.
Despite this, I found that GCC was saving the %esp into old_stack_ptr, but was NOT replacing %esp with new_stack_ptr. Upon deeper inspection, I found it actually expanded my assembly and added it's own instructions, which are the following:
mov -0x14(%ebp),%eax
mov %esp,%eax
mov %eax,%esp
mov %eax,-0x18(%ebp)
I think GCC is trying to preserve the %esp, because I don't have it explicitly declared as an "output" operand... I could be totally wrong with this...
I really wanted to use extended inline assembly to do this, because if not, it seems like I have to "hard code" the location offsets off of %ebp into the assembly, and I'd rather use the variable names like this... especially because this code needs to work on a few different systems, which seem to all offset my variables differently, so using extended inline assembly allows me to explicitly say the variable location... but I don't understand why it is doing the extra stuff and not letting me overwrite the stack pointer like it was before, ever since I started using extended assembly, it's been doing this.
I appreciate any help!!!

Okay so the problem is gcc is allocating input and output to the same register eax. You want to tell gcc that you are clobbering the output before using the input, aka. "earlyclobber".
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=&r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
Notice the & sign for the output. This should fix your code.
Update: alternatively, you could force input and output to be the same register and use xchg, like so:
asm __volatile__("xchg %%esp, %0\n\t"
: "=r"(old_stack_ptr) /* output */
: "0"(new_stack_ptr) /* input */
);
Notice the "0" that says "put this into the same register as argument 0".

(GNU inline assembly) How to use a register which not assigned from nor copy to the C variables?

I'm writing inline assembly statements using a GNU-based toolchain, and there are three instructions within the inline assembly to update a single bit of a system register. The steps will be:
move(read) a system register to a general register
'AND' it with the variable value from C code
move(write) back to the system register just read
in the instruction set I'm using, the inline assembly syntax is like this:
unsigned int OV_TMP = 0xffefffff;
asm volatile ( "mfsr %0, $PSW\n\t"
"and %0, %0, %1\n\t"
"mtsr %0, $PSW"
: : "r"(OV_TMP) : );
%1 is the register which I want to forward the value of OV_TMP into.
%0 is the problem for me, and my problem is :
How to write the inline assembly code once there is a register used internally and is not assigned from nor copy to the C variables in the C code?

The thing to consider here is that, from the compiler's perspective, the register is assigned-to by the inline assembly, even if you don't use it again later. That is, you're generating the equivalent of:
register unsigned int OV_TMP = 0xffefffff, scratch;
scratch = magic() & OV_TMP;
more_magic(scratch);
/* and then don't re-use scratch for anything from here on */
The magic and/or more_magic steps cannot be moved or combined away because of the volatile, so the compiler cannot simply delete the written-but-unused register.
The mfsr and mtsr look like powerpc instructions to me, and I would probably do the and step in C code (see footnote); but the following should generally work:
unsigned int OV_TMP = 0xffefffff, scratch;
asm volatile("mfsr %0, $PSW\n\t"
"and %0, %0, %1\n\t"
"mtsr %0, $PSW"
: "=&r"(scratch) : "r"(OV_TMP));
Here the "=&r" constraint says that the output operand (%0) is written before the input operand (%1) is read.
Footnote: As far as I know (which is not very far, I've only ever done a tiny bit of ppc assembly) there's no need to keep the mfsr and mtsr instructions a specific distance apart, unlike certain lock-step sequences on other processors. If so, I would write something more like this:
static inline unsigned int read_psw() {
unsigned int result;
asm volatile("mfsr %0, $PSW" : "=r"(result));
return result;
}
static inline void write_psw(unsigned int value) {
asm volatile("mtsr %0, $PSW" :: "r"(value));
}
#define PSW_FE0 0x00100000 /* this looks like it's FE0 anyway */
...
write_psw(read_psw() & ~PSW_FE0); /* some appropriate comment here */

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight