Inline assembler question - c

Using inline assembler I could specify an add operation with
two inputs and one result as follows:
int a = 5;
int b = 5;
int res;
asm volatile (
" add %1, %2, %0 \n\t"
: "=r" (res)
: "r" (a), "r" (b)
: "%g0"
);
On a 32-bit architecture, this produces me an instruction word that could look like
this: 0x91050101
Now I am wondering, rather then explicitly specifying the assembler code for the addition,
I would like to specify the instruction word right away and put it into the executable. That should look something like this here
asm volatile (%x91, %x05, %x01, %x01);
Anyone an idea where I can find more information how this could be done and how the syntax has to look like to do that (the above is only a wild guess).
Many thanks!

asm volatile (
" .byte 0x91, 0x5, 0x1, 0x1 \n"
);
should do it.
You find the documentation at http://sourceware.org/binutils/docs/as/

Microsoft supports the _emit pseudo instruction
http://msdn.microsoft.com/en-us/library/1b80826t.aspx
I'm not sure what g++ supports

Related

x86-64 Zero Flag is clearing between inline calls (and another problem)

I am using the bsf x86-64 instruction found on page 210 of Intels developers manual found here. Essentially, if a least significant 1 bit is found, its bit index is stored in the destination operand .
Furthermore, the ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared.
I am compiling my C code with inline x86-64 assembly instructions. I have defined a C function which invokes the bsf instruction:
uint64_t bitScanForward(T_bitboard b) {
__asm__(
"bsf %rcx,%rax\n"
"leave\n"
"ret\n"
);
}
and also another C function which checks if the status of the ZF bit in the flag register:
uint64_t isZFSet() {
printf("\n"); <- This is another problem I am having (see below)...
__asm__(
"jz true\n"
"movq $0,%rax\n"//return false
"jmp end\n"
"true:\n"
"movq $1,%rax\n"//return true
"end:\n"
"leave\n"
"ret\n"
);
}
I have tested these and found that the ZF flag is always cleared even when the bsf comand is applied to the number zero, seemingly going against the specification.
//Calling function...
//Do stuff...
bitScanForward(0ULL);//ULL is 64 bit on my machine
if(isZFSet()){//ZF flag *should* be set here but its not
printf("ZF flag is set\n");
}
//More stuff...
I suspect the reason the ZF flag is clearing is due to entering and leaving one set of inline instructions to another.
How can I ensure that the flag in the above code is set as specified in the documentation? (I don't want to change much of my code or design)
My "other problem" is that if I dont include the printf statement in the isZFFlagSet, the function seemingly doesnt execute. Totally bizarre. Can anyone explain why?
You are treating an aggressively optimizing C compiler as if it were a macro assembler. That just plain isn't going to work. To get GCC to emit correct code in the presence of assembly inserts, you have to annotate the inserts with complete information about the registers and memory regions that are affected by the assembly code, and you have to use ancillary C statements to mesh them with the surrounding code. Even then, there are things the assembly insert cannot do at all. I urge you to scrap this entire mess and instead use the __builtin_ctzll intrinsic, as suggested in the comments on the question.
Now, to specifics. Your first function is incorrect because GCC does not support use of leave or ret inside an assembly insert. (More generally, assembly inserts may not alter the stack pointer, and may only jump to designated labels within the same function.) The correct way to use bsf from a GCC-style assembly insert is with "extended asm" with input and output operands:
uint64_t bitScanForward(uint64_t b) {
uint64_t ret;
asm ("bsf %1, %0" : "=r" (ret) : "r" (b));
return ret;
}
You must declare a C variable to receive the output of the operation, and explicitly return that variable; having bsf write to %rax would not work (unlike how it was in old MSVC). BSF accepts any two registers as operands, so there is no need to use constraints more specific than r.
Your second function is incorrect because you didn't tell GCC that the condition codes were meaningful after bitScanForward, and because GCC does not support using the condition-code register as an input to an assembly insert. In order to read the ZF output from bsf you must do so within the same assembly insert that invoked bsf:
uint64_t countTrailingZeroes(uint64_t b) {
uint64_t ret;
asm ("bsf %1, %0\n\t"
"cmove %2, %0"
: "=&r" (ret)
: "r" (b), "rm" (64));
return ret;
}
This requires special care -- see how the constraint on operand 0 is now =&r instead of just =r? Without that, GCC is liable to think it can put operand 2 in the same register as operand 0.
Alternatively, you can specify that ZF is an output, which is supported (see the "flag output operands" section of the manual) and then supply a default value from C:
uint64_t countTrailingZeroes(uint64_t b) {
uint64_t ret;
int zf;
asm ("bsf %2, %0"
: "=r" (ret), "=#ccz" (zf) : "r" (b));
if (zf) ret = 64;
return ret;
}

Read a register in arm64 using C using the QNX compiler

I want to read a register named x0 in arm64 (not x86_64) using C language. What's the best way (bug free and portability?)
I search all the network, I just find some ways:
register int *foo asm ("a5"); //1
register int foo asm ("a5"); //2 which right?
or
intptr_t sp;
asm ("movl %%esp, %0" : "=r" (sp) ); //3
The first way have some bugs I think. x0 in arm64 is 64bit. I think int *f can not hold the 64 bit addr.
The second way is for x86. It seem not work make it in this way:
asm ("movl %x0, %0" : "=r" (sp) );
So what's the correct way read a register in C
The easiest way to do so is like this:
uint64_t foo;
asm volatile ("mov %0, x0" : "=r"(foo) ::);
This copies the content of register x0 into the variable foo. Note that the content of x0 is going to be fairly unpredictable at any given point in the code; I don't quite see the use in finding its contents. You should escpecially not rely on x0 containing any particular value at the beginning or end of a function or right before or after calling a function. The C compiler is allowed to use any register for any purpose at any point in the program and it is known to make use of this right.

inline assembly in avr

void save_context(uint8_t index) {
context *this_context = contextArray + index;
uint8_t *this_stack = this_context->stack;
asm volatile("st %0 r0": "r"(this_stack));
}
I have something like this.
!!! I would like to store the registers r0 r1 r2... into my stack[] array.
What I am programming is the context switch. The context has the structure like this:
typedef struct context_t {
uint8_t stack[THREAD_STACK_SIZE];
void *pstack;
struct context_t *next;
}context;
My problem is that I am not able to pass the c variable "this_stack" to inline assembly. My aim is to store all the registers, stack pointer and SREG on my stack.
After compiling, it gives error:
Description Resource Path Location Type
`,' required 5_multitasking line 754, external location: C:\Users\Jiadong\AppData\Local\Temp\ccDo7xn3.s C/C++ Problem
I looked up the avr inline assembly tutorial. But I don't quite get a lot.
Could anyone help me?
asm volatile ("st %0 r0": "r"(this_stack));
There are several problems in that line: Wrong % print-modifier, missing , between the operands, incorrect constraint and missing description of side effects.
The memory access is supposed to use indirect addressing, so one way is to use indirect+displacement with "b"ase register Y or Z:
asm volatile ("std %a1+0, r0" "\n\t"
"std %a1+1, r1" "\n\t"
"..."
: "+m" (this_context->stack)
: "b" (this_stack));
Notice print modifier %a which prints R30 as Z and not as r30.
Operand 0 is just used to express that this_context->stack is being changed if you don't want the all-memory-clobber "memory". Moreover, there's no need for an intermediate variable for operand 1 because it's not altered: you can use just as well "b" (this_context->stack) for operand 1.
Alternatively, post-increment addressing on "e"xtended (pointer) registers X, Y or Z can be used:
asm volatile ("st %a1+, r0" "\n\t"
"st %a1+, r1" "\n\t"
"..."
: "=m" (this_context->stack), "+e" (this_stack));
"label" makes no sense, that should be a constraint. It also makes no sense trying to save the stack pointer into an array. It might make sense to load the stack pointer with the address of that array, but that's not the save_context.
Anyway, to get the value of SPL which is the stack pointer you can do something like this:
asm volatile("in %0, %1": "=r" (*this_stack) : "I" (_SFR_IO_ADDR(SPL)));
(There is a q constraint but at least my gcc version doesn't like it.)
To get true registers, for example r26 you can do:
register uint8_t r26_value __asm__("r26");
asm volatile("": "=r" (r26_value));
There is a constraint, "m", documented in the GCC manual, but it doesn't always work on AVR. Here is an example of how it should work from sanguino/bootloaders/atmega644p/ATmegaBOOT
asm volatile("...
...
"sts %0,r16 \n\t"
...
: "=m" (SPMCSR) : ... );
I have found "m" to be fragile though. If a function uses a variable in C code, outside of the inline assembly, the compiler may choose to place it in the Z register and it will try to use Z in assembler too. This causes an assembler error when used with the sts instruction. Looking at the assembler output from the C compiler is the best way to debug this kind of problem.
Rather than using an "m" constraint, you can just put the literal address you want into your assembler code. For an example, see pins_teensy.c, where timer_0_fract_count is not included in the :
asm volatile(
...
"sts timer0_fract_count, r24" "\n\t"

What ensures reads/writes of operands occurs at desired timed with extended ASM?

According to GCC's Extended ASM and Assembler Template, to keep instructions consecutive, they must be in the same ASM block. I'm having trouble understanding what provides the scheduling or timings of reads and writes to the operands in a block with multiple statements.
As an example, EBX or RBX needs to be preserved when using CPUID because, according to the ABI, the caller owns it. There are some open questions with respect to the use of EBX and RBX, so we want to preserve it unconditionally (its a requirement). So three instructions need to be encoded into a single ASM block to ensure the consecutive-ness of the instructions (re: the assembler template discussed in the first paragraph):
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=b"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
If the expression representing the operands is interpreted at the wrong point in time, then __EBX will be the saved EBX (and not the CPUID's EBX), which will likely be a pointer to the Global Offset Table (GOT) if PIC is enabled.
Where, exactly, does the expression specify that the store of CPUID's %EBX into __EBX should happen (1) after the PUSH %EBX; (2) after the CPUID; but (3) before the POP %EBX?
In your question you present some code that does a push and pop of ebx. The idea of saving ebx in the event that you compile with gcc using -fPIC (position independent code) is correct. It is up to our function not to clobber ebx upon return in that situation. Unfortunately the way you have defined the constraints you explicitly use ebx. Generally the compiler will warn you (error: inconsistent operand constraints in an 'asm') if you are using PIC code and you specify =b as an output constraint. Why it doesn't produce a warning for you is unusual.
To get around this problem you can let the assembler template choose a register for you. Instead of pushing and popping we simply exchange %ebx with an unused register chosen by the compiler and restore it by exchanging it back after. Since we don't wish to have the compiler clobber our input registers during the exchange we specify early clobber modifier, thus ending up with a constraint of =&r (instead of =b in the OPs code). More on modifiers can be found here. Your code (for 32 bit) would look something like:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
If you intend to compile for X86_64 (64 bit) you'll need to save the entire contents of %rbx. The code above will not quite work. You'd have to use something like:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
You could code this up using conditional compilation to deal with both X86_64 and i386:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
#if defined(__i386__)
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#elif defined(__x86_64__)
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#else
#error "Unknown architecture."
#endif
GCC has a __cpuid macro defined in cpuid.h. It defined the macro so that it only saves the ebx and rbx register when required. You can find the GCC 4.8.1 macro definition here to get an idea of how they handle cpuid in cpuid.h.
The astute reader may ask the question - what stops the compiler from choosing ebx or rbx as the scratch register to use for the exchange. The compiler knows about ebx and rbx in the context of PIC, and will not allow it to be used as a scratch register. This is based on my personal observations over the years and reviewing the assembler (.s) files generated from C code. I can't say for certain how more ancient versions of gcc handled it so it could be a problem.
I think you understand, but to be clear, the "consecutive" rule means that this:
asm ("a");
asm ("b");
asm ("c");
... might get other instructions interposed, so if that's not desirable then it must be rewritten like this:
asm ("a\n"
"b\n"
"c");
... and now it will be inserted as a whole.
As for the cpuid snippet, we have two problems:
The cpuid instruction will overwrite ebx, and hence clobber the data that PIC code must keep there.
We want to extract the value that cpuid places in ebx while never returning to compiled code with the "wrong" ebx value.
One possible solution would be this:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"mov %ebx, %ecx"
"pop %ebx"
: "=c"(__EBX)
: "a"(__FUNC), "c"(__SUBFUNC)
: "eax", "edx"
);
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
There's no need to mark ebx as clobbered as you're putting it back how you found it.
(I don't do much Intel programming, so I may have some of the assembler-specific details off there, but this is how asm works.)

(GNU inline assembly) How to use a register which not assigned from nor copy to the C variables?

I'm writing inline assembly statements using a GNU-based toolchain, and there are three instructions within the inline assembly to update a single bit of a system register. The steps will be:
move(read) a system register to a general register
'AND' it with the variable value from C code
move(write) back to the system register just read
in the instruction set I'm using, the inline assembly syntax is like this:
unsigned int OV_TMP = 0xffefffff;
asm volatile ( "mfsr %0, $PSW\n\t"
"and %0, %0, %1\n\t"
"mtsr %0, $PSW"
: : "r"(OV_TMP) : );
%1 is the register which I want to forward the value of OV_TMP into.
%0 is the problem for me, and my problem is :
How to write the inline assembly code once there is a register used internally and is not assigned from nor copy to the C variables in the C code?
The thing to consider here is that, from the compiler's perspective, the register is assigned-to by the inline assembly, even if you don't use it again later. That is, you're generating the equivalent of:
register unsigned int OV_TMP = 0xffefffff, scratch;
scratch = magic() & OV_TMP;
more_magic(scratch);
/* and then don't re-use scratch for anything from here on */
The magic and/or more_magic steps cannot be moved or combined away because of the volatile, so the compiler cannot simply delete the written-but-unused register.
The mfsr and mtsr look like powerpc instructions to me, and I would probably do the and step in C code (see footnote); but the following should generally work:
unsigned int OV_TMP = 0xffefffff, scratch;
asm volatile("mfsr %0, $PSW\n\t"
"and %0, %0, %1\n\t"
"mtsr %0, $PSW"
: "=&r"(scratch) : "r"(OV_TMP));
Here the "=&r" constraint says that the output operand (%0) is written before the input operand (%1) is read.
Footnote: As far as I know (which is not very far, I've only ever done a tiny bit of ppc assembly) there's no need to keep the mfsr and mtsr instructions a specific distance apart, unlike certain lock-step sequences on other processors. If so, I would write something more like this:
static inline unsigned int read_psw() {
unsigned int result;
asm volatile("mfsr %0, $PSW" : "=r"(result));
return result;
}
static inline void write_psw(unsigned int value) {
asm volatile("mtsr %0, $PSW" :: "r"(value));
}
#define PSW_FE0 0x00100000 /* this looks like it's FE0 anyway */
...
write_psw(read_psw() & ~PSW_FE0); /* some appropriate comment here */

Resources