I have this below code snippet from kernel source for PowerPc
#define SPRN_IVOR32 0x210 /* Interrupt Vector Offset Register 32 */
unsigned long ivor[3];
ivor[0] = mfspr(SPRN_IVOR32);
#define __stringify_1(x) #x
#define __stringify(x) __stringify_1(x)
#define mfspr(rn) ({unsigned long rval; \
asm volatile("mfspr %0," __stringify(rn) \
: "=r" (rval)); rval; })
Also, it this above exercise is about emulating MSR register's bits in PowerPc?
Can anyone help me on what exactly we are doing here?
The mfspr macro generates an asm instruction mfspr which reads the given special purpose register into a register chosen by the compiler, which then gets assigned to rval hence becomes the return value of the expression.
As the comment says, SPRN_IVOR32 is the Interrupt Vector Offset Register 32, whose contents are thus fetched into ivor[0].
Related
According to GCC's Extended ASM and Assembler Template, to keep instructions consecutive, they must be in the same ASM block. I'm having trouble understanding what provides the scheduling or timings of reads and writes to the operands in a block with multiple statements.
As an example, EBX or RBX needs to be preserved when using CPUID because, according to the ABI, the caller owns it. There are some open questions with respect to the use of EBX and RBX, so we want to preserve it unconditionally (its a requirement). So three instructions need to be encoded into a single ASM block to ensure the consecutive-ness of the instructions (re: the assembler template discussed in the first paragraph):
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=b"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
If the expression representing the operands is interpreted at the wrong point in time, then __EBX will be the saved EBX (and not the CPUID's EBX), which will likely be a pointer to the Global Offset Table (GOT) if PIC is enabled.
Where, exactly, does the expression specify that the store of CPUID's %EBX into __EBX should happen (1) after the PUSH %EBX; (2) after the CPUID; but (3) before the POP %EBX?
In your question you present some code that does a push and pop of ebx. The idea of saving ebx in the event that you compile with gcc using -fPIC (position independent code) is correct. It is up to our function not to clobber ebx upon return in that situation. Unfortunately the way you have defined the constraints you explicitly use ebx. Generally the compiler will warn you (error: inconsistent operand constraints in an 'asm') if you are using PIC code and you specify =b as an output constraint. Why it doesn't produce a warning for you is unusual.
To get around this problem you can let the assembler template choose a register for you. Instead of pushing and popping we simply exchange %ebx with an unused register chosen by the compiler and restore it by exchanging it back after. Since we don't wish to have the compiler clobber our input registers during the exchange we specify early clobber modifier, thus ending up with a constraint of =&r (instead of =b in the OPs code). More on modifiers can be found here. Your code (for 32 bit) would look something like:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
If you intend to compile for X86_64 (64 bit) you'll need to save the entire contents of %rbx. The code above will not quite work. You'd have to use something like:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
You could code this up using conditional compilation to deal with both X86_64 and i386:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
#if defined(__i386__)
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#elif defined(__x86_64__)
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#else
#error "Unknown architecture."
#endif
GCC has a __cpuid macro defined in cpuid.h. It defined the macro so that it only saves the ebx and rbx register when required. You can find the GCC 4.8.1 macro definition here to get an idea of how they handle cpuid in cpuid.h.
The astute reader may ask the question - what stops the compiler from choosing ebx or rbx as the scratch register to use for the exchange. The compiler knows about ebx and rbx in the context of PIC, and will not allow it to be used as a scratch register. This is based on my personal observations over the years and reviewing the assembler (.s) files generated from C code. I can't say for certain how more ancient versions of gcc handled it so it could be a problem.
I think you understand, but to be clear, the "consecutive" rule means that this:
asm ("a");
asm ("b");
asm ("c");
... might get other instructions interposed, so if that's not desirable then it must be rewritten like this:
asm ("a\n"
"b\n"
"c");
... and now it will be inserted as a whole.
As for the cpuid snippet, we have two problems:
The cpuid instruction will overwrite ebx, and hence clobber the data that PIC code must keep there.
We want to extract the value that cpuid places in ebx while never returning to compiled code with the "wrong" ebx value.
One possible solution would be this:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"mov %ebx, %ecx"
"pop %ebx"
: "=c"(__EBX)
: "a"(__FUNC), "c"(__SUBFUNC)
: "eax", "edx"
);
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
There's no need to mark ebx as clobbered as you're putting it back how you found it.
(I don't do much Intel programming, so I may have some of the assembler-specific details off there, but this is how asm works.)
I want to replace the highest byte of 32bit value with inline assembly, following code writes buffer to FRAM memory with spi interace:
#define _load_op_code(op_code, addr)\
__asm__ __volatile__ (\
" ldi %D0, %1" "\n\t"\
: "=d" ((uint32_t)addr)\
: "M" (op_code)\
)
#define SMEM_WREN 0x06
#define SMEM_WRITE 0x02
void fram_write(uint32_t addr, uint8_t *buf, uint16_t len) {
FRAM_SELECT();
spi_send_char(SMEM_WREN);
FRAM_DESELECT();
_load_op_code(SMEM_WRITE, addr);
FRAM_SELECT();
spi_send_32b(addr);
spi_send(buf, len);
FRAM_DESELECT();
}
after _load_op_code() inline assembly addr variable gets cluttered - compiler use registers allocated for addr as temporary registers for other operations and i lose original addr value. addr is in fact 24bit variable. Any idea whats wrong with this code?
The original value of addr is lost because it is overwritten by the asm statement with SMEM_WRITE and 3 undefined bytes. From the GCC manual:
The ordinary output operands must be write-only; GCC assumes that the values in these operands before the instruction are dead and need not be generated. Extended asm supports input-output or read-write operands. Use the constraint character ‘+’ to indicate such an operand and list it with the output operands.
I'm writing inline assembly statements using a GNU-based toolchain, and there are three instructions within the inline assembly to update a single bit of a system register. The steps will be:
move(read) a system register to a general register
'AND' it with the variable value from C code
move(write) back to the system register just read
in the instruction set I'm using, the inline assembly syntax is like this:
unsigned int OV_TMP = 0xffefffff;
asm volatile ( "mfsr %0, $PSW\n\t"
"and %0, %0, %1\n\t"
"mtsr %0, $PSW"
: : "r"(OV_TMP) : );
%1 is the register which I want to forward the value of OV_TMP into.
%0 is the problem for me, and my problem is :
How to write the inline assembly code once there is a register used internally and is not assigned from nor copy to the C variables in the C code?
The thing to consider here is that, from the compiler's perspective, the register is assigned-to by the inline assembly, even if you don't use it again later. That is, you're generating the equivalent of:
register unsigned int OV_TMP = 0xffefffff, scratch;
scratch = magic() & OV_TMP;
more_magic(scratch);
/* and then don't re-use scratch for anything from here on */
The magic and/or more_magic steps cannot be moved or combined away because of the volatile, so the compiler cannot simply delete the written-but-unused register.
The mfsr and mtsr look like powerpc instructions to me, and I would probably do the and step in C code (see footnote); but the following should generally work:
unsigned int OV_TMP = 0xffefffff, scratch;
asm volatile("mfsr %0, $PSW\n\t"
"and %0, %0, %1\n\t"
"mtsr %0, $PSW"
: "=&r"(scratch) : "r"(OV_TMP));
Here the "=&r" constraint says that the output operand (%0) is written before the input operand (%1) is read.
Footnote: As far as I know (which is not very far, I've only ever done a tiny bit of ppc assembly) there's no need to keep the mfsr and mtsr instructions a specific distance apart, unlike certain lock-step sequences on other processors. If so, I would write something more like this:
static inline unsigned int read_psw() {
unsigned int result;
asm volatile("mfsr %0, $PSW" : "=r"(result));
return result;
}
static inline void write_psw(unsigned int value) {
asm volatile("mtsr %0, $PSW" :: "r"(value));
}
#define PSW_FE0 0x00100000 /* this looks like it's FE0 anyway */
...
write_psw(read_psw() & ~PSW_FE0); /* some appropriate comment here */
This is a strange request but I have a feeling that it could be possible. What I would like is to insert some pragmas or directives into areas of my code (written in C) so that GCC's register allocator will not use them.
I understand that I can do something like this, which might set aside this register for this variable
register int var1 asm ("EBX") = 1984;
register int var2 asm ("r9") = 101;
The problem is that I'm inserting new instructions (for a hardware simulator) directly and GCC and GAS don't recognise these yet. My new instructions can use the existing general purpose registers and I want to make sure that I have some of them (i.e. r12->r15) reserved.
Right now, I'm working in a mockup environment and I want to do my experiments quickly. In the future I will append GAS and add intrinsics into GCC, but right now I'm looking for a quick fix.
Thanks!
When writing GCC inline assembler, you can specify a "clobber list" - a list of registers that may be overwritten by your inline assembler code. GCC will then do whatever is needed to save and restore data in those registers (or avoid their use in the first place) over the course of the inline asm segment. You can also bind input or output registers to C variables.
For example:
inline unsigned long addone(unsigned long v)
{
unsigned long rv;
asm("mov $1, %%eax;"
"mov %0, %%ebx;"
"add %%eax, %%ebx"
: /* outputs */ "b" (rv)
: /* inputs */ "g" (v) /* select unused general purpose reg into %0 */
: /* clobbers */ "eax"
);
}
For more information, see the GCC-Inline-Asm-HOWTO.
If you use global explicit register variables, these will be reserved throughout the compilation unit, and will not be used by the compiler for anything else (it may still be used by the system's libraries, so choose something that will be restored by those). local register variables do not guarantee that your value will be in the register at all times, but only when referenced by code or as an asm operand.
If you write an inline asm block for your new instructions, there are commands that inform GCC what registers are used by that block and how they are used. GCC will then avoid using those registers or will at least save and reload their contents.
Non-hardcoded scratch register in inline assembly
This is not a direct answer to the original question, but since and since I keep Googling this in that context and since https://stackoverflow.com/a/6683183/895245 was accepted, I'm going to try and provide a possible improvement to that answer.
The improvement is the following: you should avoid hard-coding your scratch registers when possible, to give the register allocator more freedom.
Therefore, as an educational example that is useless in practice (could be done in a single lea (%[in1], %[in2]), %[out];), the following hardcoded scratch register code:
bad.c
#include <assert.h>
#include <inttypes.h>
int main(void) {
uint64_t in1 = 0xFFFFFFFF;
uint64_t in2 = 1;
uint64_t out;
__asm__ (
"mov %[in2], %%rax;" /* scratch = in2 */
"add %[in1], %%rax;" /* scratch += in1 */
"mov %%rax, %[out];" /* out = scratch */
: [out] "=r" (out)
: [in1] "r" (in1),
[in2] "r" (in2)
: "rax"
);
assert(out == 0x100000000);
}
could compile to something more efficient if you instead use this non-hardcoded version:
good.c
#include <assert.h>
#include <inttypes.h>
int main(void) {
uint64_t in1 = 0xFFFFFFFF;
uint64_t in2 = 1;
uint64_t out;
uint64_t scratch;
__asm__ (
"mov %[in2], %[scratch];" /* scratch = in2 */
"add %[in1], %[scratch];" /* scratch += in1 */
"mov %[scratch], %[out];" /* out = scratch */
: [scratch] "=&r" (scratch),
[out] "=r" (out)
: [in1] "r" (in1),
[in2] "r" (in2)
:
);
assert(out == 0x100000000);
}
since the compiler is free to choose any register it wants instead of just rax,
Note that in this example we had to mark the scratch as an early clobber register with & to prevent it from being put into the same register as an input, I have explained that in more detail at: When to use earlyclobber constraint in extended GCC inline assembly? This example also happens to fail in the implementation I tested on without &.
Tested in Ubuntu 18.10 amd64, GCC 8.2.0, compile and run with:
gcc -O3 -std=c99 -ggdb3 -Wall -Werror -pedantic -o good.out good.c
./good.out
Non-hardcoded scratch registers are also mentioned in the GCC manual 6.45.2.6 "Clobbers and Scratch Registers", although their example is too much for mere mortals to take in at once:
Rather than allocating fixed registers via clobbers to provide scratch registers for an asm statement, an alternative is to define a variable and make it an early-clobber output as with a2 and a3 in the example below. This gives the compiler register allocator more freedom. You can also define a variable and make it an output tied to an input as with a0 and a1, tied respectively to ap and lda. Of course, with tied outputs your asm can’t use the input value after modifying the output register since they are one and the same register. What’s more, if you omit the early-clobber on the output, it is possible that GCC might allocate the same register to another of the inputs if GCC could prove they had the same value on entry to the asm. This is why a1 has an early-clobber. Its tied input, lda might conceivably be known to have the value 16 and without an early-clobber share the same register as %11. On the other hand, ap can’t be the same as any of the other inputs, so an early-clobber on a0 is not needed. It is also not desirable in this case. An early-clobber on a0 would cause GCC to allocate a separate register for the "m" ((const double ()[]) ap) input. Note that tying an input to an output is the way to set up an initialized temporary register modified by an asm statement. An input not tied to an output is assumed by GCC to be unchanged, for example "b" (16) below sets up %11 to 16, and GCC might use that register in following code if the value 16 happened to be needed. You can even use a normal asm output for a scratch if all inputs that might share the same register are consumed before the scratch is used. The VSX registers clobbered by the asm statement could have used this technique except for GCC’s limit on the number of asm parameters.
static void
dgemv_kernel_4x4 (long n, const double *ap, long lda,
const double *x, double *y, double alpha)
{
double *a0;
double *a1;
double *a2;
double *a3;
__asm__
(
/* lots of asm here */
"#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
"#a0=%3 a1=%4 a2=%5 a3=%6"
:
"+m" (*(double (*)[n]) y),
"+&r" (n), // 1
"+b" (y), // 2
"=b" (a0), // 3
"=&b" (a1), // 4
"=&b" (a2), // 5
"=&b" (a3) // 6
:
"m" (*(const double (*)[n]) x),
"m" (*(const double (*)[]) ap),
"d" (alpha), // 9
"r" (x), // 10
"b" (16), // 11
"3" (ap), // 12
"4" (lda) // 13
:
"cr0",
"vs32","vs33","vs34","vs35","vs36","vs37",
"vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
);
}
I came across this code and need to understand what it is doing. It just seems to be declaring two bytes and then doing nothing...
uint64_t x;
__asm__ __volatile__ (".byte 0x0f, 0x31" : "=A" (x));
Thanks!
This is generating two bytes (0F 31) directly into the code stream. This is an RDTSC instruction, which reads the time-stamp counter into EDX:EAX, which will then be copied to the variable 'x' by the output constraint "=A"(x)
0F 31 is the x86 opcode for the RDTSC (read time stamp counter) instruction; it places the value read into the EDX and EAX registers.
The _ _ asm__ directive isn't just declaring two bytes, it's placing inline assembly into the C code. Presumably, the program has a way of using the value in those registers immediately afterwards.
http://en.wikipedia.org/wiki/Time_Stamp_Counter
It's inserting an 0F 31 opcode, which according to this site is:
0F 31 P1+ f2 RDTSC EAX EDX IA32_T... Read Time-Stamp Counter
Then it is storing the result in the x variable
It's inline asm for rdtsc, with the machine-code encoding written out to support really old assemblers that don't know the mnemonic.
Unfortunately, it only works correctly in 32bit code because "=A" doesn't split 64bit operands in half in 64bit code. (The gcc manual even uses rdtsc an an example to illustrate this)
The safe way to write this, which compiles to optimal code with gcc -m32 or -m64, is:
#include <stdint.h>
uint64_t timestamp_safe(void)
{
unsigned long tsc_low, tsc_high; // not uint32_t: saves a zero-extend for -m64 (but not x32 :/)
asm volatile("rdtsc" : "=d"(tsc_high), "=a" (tsc_low));
return ((uint64_t)tsc_high << 32) | tsc_low;
}
In 32bit code, it's just rdtsc/ret, but in 64bit code it does the necessary shift/or to get both halves into rax for the return value.
See it on the Godbolt compiler explorer.