Inline asm in GCC, specifying offset by expression - c

Not exactly sure how to title this, but
I want to emulate this code:
asm("movl %%fs:0x30, %0" : "=r" (peb) : : );
but I want to specify the offset variably in C
trying:
int mee = 48;
asm("movl %%fs:%1, %0"
: "=r" (peb)
: "r" (mee)
:
);
Error is bad memory operand '%eax'

For what you have written, the compiler translates the first operand to %fs:%eax because it chooses %eax to be the register holding the value of mee. The addressing mode I think you're trying to use is base:offset and the offset must be a value rather than a register. This is the reason for the error "bad memory operand". It worked in the %fs:0x30 case because 0x30 is an immediate value.
To use the register %eax as an offset, try a dereference of it %%fs:(%1) to get the value in the register:
int mee = 48;
asm("movl %%fs:(%1), %0" : "=r" (peb) : "r" (mee) :);
See also this guide, which contains some possibly useful examples of memory access (and more) in inline assembly.

Related

arm inline assembly - store C variable in arm register

Trying to save a variable in an arm register using inline assembly.
unsigned int lma_offset = 0x1234; // typically calculated, hardcoded for example
__asm volatile ("MOV R10, %[input]"
: [input] "=r" (lma_offset)
);
This changes lma_offset to 0xd100 in my case, instead of setting the register. What am I doing wrong?
PS: when I declare lma_offset as const it gives a compiler error because lma_offset is used as output. So obviously something is wrong, still I cant find the correct syntax for this.
For future reference, according to Erics comment
const unsigned int lma_offset = 0x10000;
__asm__ volatile ("MOV R10, %[input]"
: // no C variable outputs
: [input] "r" (lma_offset)
: "R10" // tell the compiler R10 is modified
);
using double : and replacing the "=r" with "r" indeed solves the problem.
It would also be possible to ask the compiler to have that constant already in R10 for an asm statement, by using a register local variable to force the "r" input to pick r10. (Then we can omit the redundant mov r10, r10).
register unsigned r10 __asm__("r10") = lma_offset; // picks r10 for "r" constraints, no other *guaranteed* effects
__asm__ volatile ("# inline asm went here" // empty template, actually just a comment you can see if looking at the compiler's asm output
: // no C variable outputs
: [input] "r" (lma_offset)
: // no clobbers needed
);
When writing a register to some output C variable it would result in
unsigned int lma_offset = 0x0;
__asm__ volatile ("MOV %[output], R11"
: [output] "=r" (lma_offset)
// no clobbers needed; reading a register doesn't step on the compiler's toes
);

Input parameter index when using "+" modifier in extended asm output parameter?

Extended asm gives the following description regarding the "+" modifier:
Operands using the ‘+’ constraint modifier count as two operands (that
is, both as input and output) towards the total maximum of 30 operands
per asm statement.
So I assume that it is not necessary to mention output operand with the "+" modifier in the input section again, but it is not specified how to determine their index. I wrote the following example Godbolt :
#include <stdint.h>
#include <inttypes.h>
#include <stdio.h>
void asm_add(uint64_t o1, uint64_t o2, uint64_t o3){
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
printf("o2 = %" PRIu64 "\n", o2);
printf("o3 = %" PRIu64 "\n", o3);
}
int main(void){
asm_add(20, 30, 40);
}
Which printed
o2 = 50
o3 = 60
Is the template using +
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1):
"cc"
);
exactly the same as
__asm__ volatile (
"addq %2, %3\n\
addq %2, %4":
"+r" (o2), "+r" (o3):
"r" (o1), "0" (o2), "1" (o3):
"cc"
);
where all inputs are specified explicitly? So in the first example the "implicit" inputs are appended.
By using "+r" (o2), you are saying that this parameter needs to contain o2 on entry to the asm block, and will contain an updated value on exit.
In other words, %0 describes both input and output. The fact that you can (apparently?) reference indices greater than the number of parameters is an undocumented quirk. Don't depend upon it.
You might also consider using symbolic names, which (I find) are easier to read, especially as the number of asm lines goes up. Names are particularly useful when you are first creating the asm and there's the potential for adding/removing parameters. Having to renumber everything is painful and error prone:
__asm__ volatile (
"addq %[o1], %[o2]\n\
addq %[o1], %[o3]":
[o2] "+r" (o2), [o3] "+r" (o3):
[o1] "r" (o1):
"cc"
);
Lastly, consider not using inline asm for anything beyond educational purposes. And even then, inline asm is the hardest possible way to learn asm.

What ensures reads/writes of operands occurs at desired timed with extended ASM?

According to GCC's Extended ASM and Assembler Template, to keep instructions consecutive, they must be in the same ASM block. I'm having trouble understanding what provides the scheduling or timings of reads and writes to the operands in a block with multiple statements.
As an example, EBX or RBX needs to be preserved when using CPUID because, according to the ABI, the caller owns it. There are some open questions with respect to the use of EBX and RBX, so we want to preserve it unconditionally (its a requirement). So three instructions need to be encoded into a single ASM block to ensure the consecutive-ness of the instructions (re: the assembler template discussed in the first paragraph):
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=b"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
If the expression representing the operands is interpreted at the wrong point in time, then __EBX will be the saved EBX (and not the CPUID's EBX), which will likely be a pointer to the Global Offset Table (GOT) if PIC is enabled.
Where, exactly, does the expression specify that the store of CPUID's %EBX into __EBX should happen (1) after the PUSH %EBX; (2) after the CPUID; but (3) before the POP %EBX?
In your question you present some code that does a push and pop of ebx. The idea of saving ebx in the event that you compile with gcc using -fPIC (position independent code) is correct. It is up to our function not to clobber ebx upon return in that situation. Unfortunately the way you have defined the constraints you explicitly use ebx. Generally the compiler will warn you (error: inconsistent operand constraints in an 'asm') if you are using PIC code and you specify =b as an output constraint. Why it doesn't produce a warning for you is unusual.
To get around this problem you can let the assembler template choose a register for you. Instead of pushing and popping we simply exchange %ebx with an unused register chosen by the compiler and restore it by exchanging it back after. Since we don't wish to have the compiler clobber our input registers during the exchange we specify early clobber modifier, thus ending up with a constraint of =&r (instead of =b in the OPs code). More on modifiers can be found here. Your code (for 32 bit) would look something like:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
If you intend to compile for X86_64 (64 bit) you'll need to save the entire contents of %rbx. The code above will not quite work. You'd have to use something like:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
You could code this up using conditional compilation to deal with both X86_64 and i386:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
#if defined(__i386__)
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#elif defined(__x86_64__)
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#else
#error "Unknown architecture."
#endif
GCC has a __cpuid macro defined in cpuid.h. It defined the macro so that it only saves the ebx and rbx register when required. You can find the GCC 4.8.1 macro definition here to get an idea of how they handle cpuid in cpuid.h.
The astute reader may ask the question - what stops the compiler from choosing ebx or rbx as the scratch register to use for the exchange. The compiler knows about ebx and rbx in the context of PIC, and will not allow it to be used as a scratch register. This is based on my personal observations over the years and reviewing the assembler (.s) files generated from C code. I can't say for certain how more ancient versions of gcc handled it so it could be a problem.
I think you understand, but to be clear, the "consecutive" rule means that this:
asm ("a");
asm ("b");
asm ("c");
... might get other instructions interposed, so if that's not desirable then it must be rewritten like this:
asm ("a\n"
"b\n"
"c");
... and now it will be inserted as a whole.
As for the cpuid snippet, we have two problems:
The cpuid instruction will overwrite ebx, and hence clobber the data that PIC code must keep there.
We want to extract the value that cpuid places in ebx while never returning to compiled code with the "wrong" ebx value.
One possible solution would be this:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"mov %ebx, %ecx"
"pop %ebx"
: "=c"(__EBX)
: "a"(__FUNC), "c"(__SUBFUNC)
: "eax", "edx"
);
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
There's no need to mark ebx as clobbered as you're putting it back how you found it.
(I don't do much Intel programming, so I may have some of the assembler-specific details off there, but this is how asm works.)

Adding values in C using inline assembly

Im trying to get a grasp on the basics of inline assembly in C (ATT assembly), so Im practicing by adding 2 variables.
Ok so this works as intended; the src variable gets copied to the dst variable and then the dst variable gets added by 5. The values of src and dst are 1 and 6 respectively.
int src = 1;
int dst = 0;
asm ("mov %[SRC], %[DEST]\n\t"
"add $5, %0"
: [DEST] "=r" (dst));
: [SRC] "r" (src));
But when I try this out, the values of src and dst are still 1 and 6. I was expecting src to have the value 1 and dst to have the value 5 since adding 5 to dst (which has the value of 0 since the MOV operation has been removed) should have the output 5.
int src = 1;
int dst = 0;
asm ("add $5, %[DEST]"
: [DEST] "=r" (dst)
: [SRC] "r" (src));
So I then try removing the src as an input operand using the following code, but now dst gets the value 11.
int dst = 0;
asm (
"add $5, %[DEST]"
: [DEST] "=r" (dst));
Now I'm a bit confused how it works. What am I misunderstanding?
The first part of your code works as expected. There
mov %[SRC], %[DEST] ; copies %[SRC] into %[DEST], which is now 1
add $5, %0 ; adds 5 to %0 (which is %[DEST]), so that's 6
The second part does not work because you never use %[SRC], and because %[DEST] is not an input operand, so its value doesn't come into the calculation. You just get what happens to be in the register gcc decides to use. The third part fails for the same reason.
For it to work, you need to specify dst as both an input and output operand, since you're both using its value and changing it. However, this does not work:
asm("add $5, %0" // This does not work!
: "=r" (dst)
: "r" (dst));
because now you have an input operand %1 with value dst and a distinct output operand %0 whose value will be written to dst, and you never use %1. This notation would allow you to write
asm("mov %1, %0; add $5, %0" // needlessly inefficient!
: "=r" (dst)
: "r" (dst));
but that is, of course, needlessly inefficient. In order to do this with a single register, you need to use a matching constraint like this:
asm("add $5, %0"
: "=r" (dst)
: "0" (dst));
This tells gcc that %0 as an input operand is allowed, and that it has the value of dst. Here is the relevant part of the gcc manual.
With named operands, finally, it looks like this:
asm ("add $5, %[DEST]"
: [DEST] "=r" (dst)
: "[DEST]" (dst));

Inserting the address of a constant in inline assembly code

I want to translate this function:
iowrite32(mem1, value1);
into assembly code.
mem1 is defined as:
int * mem1;
in order to use ioremap.
I've written this code:
asm volatile(
"mov %[whr],%[wht]"
: [whr] "=r" (mem1)
: [wht] "r" (value)
);
Then I've realized I don't want to move value to mem1, but to the ADDRESS stored in mem1.
How do I write it in assembly?
You might want to take a look at the m constraint
asm volatile(
"mov %[wht], %[whr];"
: [whr] "=m" (*mem1)
: [wht] "r" (value)
);

Resources