I'm doing a study assignment to measure memory access time on my machine.
To determine the clock cycles on our machines, we have been given the following C snippet:
static inline void getcyclecount(uint64_t* cycles)
{
__asm __volatile(
"cpuid # force all previous instruction to complete\n\t"
"rdtsc # TSC -> edx:eax \n\t"
"movl %%edx, 4(0) # store edx\n\t"
"movl %%eax, 0(0) # store eax\n\t"
: : "r"(cycles) : "eax", "ebx", "ecx", "edx");
}
However, when I try to compile this (XCode 4, using "Apple LLVM Compiler 2.1"), it results twice in the error "Unexpected token in memory operand" at the "\t" of the rdtsc resp. first movl instruction line.
I know basic assembler, but have no clue about the C inline assembler format.
Does anyone of you know what could be the issue with this code?
Thanks!
Assuming this is GCC inline assembly syntax, you're missing a % in the memory operand:
__asm __volatile(
"cpuid # force all previous instruction to complete\n\t"
"rdtsc # TSC -> edx:eax \n\t"
"movl %%edx, 4(%0) # store edx\n\t"
"movl %%eax, 0(%0) # store eax\n\t"
: : "r"(cycles) : "eax", "ebx", "ecx", "edx");
Related
For entertainment, I am learning gnu extended assembly using AT&T syntax for x86 with a 32bit Linux target. I have just spent the last three hours coding two possible solutions to my challenge of swapping the values of two integer variables a and b, and neither of my solutions completely solved my problem. First, let's look at my TODO obstacle in some more detail:
int main()
{
int a = 2, b = 1;
printf("a is %d, b is %d\n", a, b);
// TODO: swap a and b using extended assembly, and do not modify the program in any other way
printf("a is %d, b is %d\n", a, b);
}
After reading this HOWTO, I wrote the following inline extended assembler code. Here is my first attempt at swapping the integers:
asm volatile("movl %0, %%eax;"
"movl %1, %%ecx;"
"movl %%ecx, %0;"
: "=r" (a)
: "r" (b)
: "%eax", "%ecx");
asm volatile("movl %%eax, %0;"
: "=r" (b)
: "r" (a)
: "%eax", "%ecx");
My reasoning was that to set a = b, I needed an extended assembly call that was separated from the assembly to set b = a. So I wrote the two extended assembly calls, compiled my code, i.e., gcc -m32 asmPractice.c, and ran a.out. The results were as follows:
a is 2, b is 1
a is 1, b is 1
Seeing how that did not work properly, I then decided to combine the two extended assembler calls, and wrote this:
asm volatile("movl %0, %%eax;"
"movl %1, %%ecx;"
"movl %%ecx, %0;"
"movl %%eax, %1;"
: "=r" (a)
: "r" (b));
After recompiling and linking, my code still does not correctly swap both values. See for yourself. Here are my results:
a is 2, b is 1
a is 1, b is 1
Here are some solutions from the comments:
Solution #0 (best option): https://gcc.gnu.org/wiki/DontUseInlineAsm
Even the zero-instruction solution defeats constant-propagation, and any other optimization that involves gcc knowing anything about the value. It also forces the compiler to have both variables in registers at the same time at that point. Always keep these downsides in mind when considering using inline-asm instead of builtins / intrinsics.
Solution #1: x86 xchg, no scratch regs, and works in both AT&T and Intel-syntax modes. Costs about the same as 3 mov instructions on most Intel CPUs, or only 2 uops on some AMD.
asm("xchg %0, %1;" : "+r" (a), "+r" (b));
Solution #2: purely using GNU C inline asm constraints. (Bonus: portable to all architectures)
asm("" : "=r" (a), "=r" (b) : "1" (a), "0" (b));
See all three solutions in action on the Godbolt compiler explorer, including examples of them defeating optimization:
int swap_constraints(int a, int b) {
asm("" : "=r" (a), "=r" (b) : "1" (a), "0" (b));
return a;
}
// Demonstrate the optimization-defeating behaviour:
int swap_constraints_constants(void) {
int a = 10, b = 20;
return swap_constraints(a, b) + 15;
}
swap_constraints_constants:
movl $10, %edx
movl $20, %eax
addl $15, %eax
ret
vs. with a pure C swap:
swap_noasm_constants:
movl $35, %eax # the add is done at compile-time, and `a` is optimized away as unused.
ret
//quick inline asm statements performing the swap_byte for key_scheduling
inline void swap_byte(unsigned char *x, unsigned char *y)
{
unsigned char t;
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(t)
:"r"(*x)
:"%eax");
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(*x)
:"r"(*y)
:"%eax");
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(*y)
:"r"(t)
:"%eax");
}
Here I am trying to swap the char from x and store in y, and the same for y to x.
I have compiled these instructions by changing movl to mov but with no success. Where is the problem in compiling/linking?
Here is the output from compiling in cygwin:
$ gcc rc4_main.c -o rc4ex
/tmp/ccy0wo6H.s: Assembler messages:
/tmp/ccy0wo6H.s:18: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:18: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:26: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:26: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:34: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:34: Error: operand type mismatch for `mov'
To simplify it even more (than user35443):
asm("" : "=r" (*x), "=r" (*y) : "1" (*x), "0" (*y));
Look ma! No code! And yes, this really works.
To explain how this works:
When the compiler is building the code, it keeps track of what value is in each register. So if had these for inputs to asm:
"r" (*x), "r" (*y)
The compiler will pick a register and put *x in it, then pick a register and put *y in it, then call your asm. But it also keeps track of what variable is in which register. If there were just some way to tell the compiler that all it had to do was start treating the two registers as the opposite variables, then we'd be set. And that's what this code does:
Saying "=r" (*x) means that we are going to be overwriting the value in *x, that that we will be putting the value into a register.
Saying "0" (*y) means that on input to the asm, the compiler must put the value of *y into the same register as is being used by output parameter #0.
So, without using any actually assembly instructions, we have told the compiler to swap these two values.
We don't get this quite "for free" since the compiler must load the values into registers before calling the asm. But since that has to happen anyway...
What about actually updating memory? The compiler will (if necessary) write these values from the registers back to memory. And since it knows what variable is in which register, all works as expected.
unsigned char t;
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(t) /* <--here */
:"r"(*x) /* <-- and here */
:"%eax");
You can not move a value from a 32-bit register to a single-byte memory location. t is on the stack and x is somewhere else, but both are accessed in the same way. Problems on the other lines are similar. You should move only a byte.
Try something like this, but there are more ways to do it (I haven't tried that, read below):
unsigned char t;
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(t)
:"r"(*x)
:"%al");
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(*x)
:"r"(*y)
:"%al");
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(*y)
:"r"(t)
:"%al");
The whole procedure can be simplified into this:
asm("movb (%1), %%al\n"
"movb (%2), %%ah\n"
"movb %%ah, (%1)\n"
"movb %%al, (%2)\n"
: /* no outputs for compiler to know about */
: "r" (x), "r" (y)
: "%ax", "memory");
line
movl %%eax, %0;
is full nonsence! So you try to change 0 constant by %eax register It's impossible. In fortran many years ago It was. After that all programs will behave quite unpredictable. Since to avoid that was introduce the rule that any identificatot can't begin with number. But You try do it. It's well to get error. You maybe mean another
movl %0, %%eax;
to set zerro to eax. So better do another code
xorl %%eax, %%eax;
is much better!
I'm trying cross-compile MongoDB to a custom Linux. It compiles fine with Linux, but when using the cross compiler toolchain, it complains about this code.
static T compareAndSwap(volatile T* dest, T expected, T newValue) {
T result = expected;
asm volatile ("push %%eax\n\t"
"push %%ebx\n\t"
"push %%ecx\n\t"
"push %%edx\n\t"
"mov (%%edx), %%ebx\n\t"
"mov 4(%%edx), %%ecx\n\t"
"mov (%%edi), %%eax\n\t"
"mov 4(%%edi), %%edx\n\t"
"lock cmpxchg8b (%%esi)\n\t"
"mov %%eax, (%%edi)\n\t"
"mov %%edx, 4(%%edi)\n\t"
"pop %%edx\n\t"
"pop %%ecx\n\t"
"pop %%ebx\n\t"
"pop %%eax\n"
:
: "S" (dest),
"D" (&result),
"d" (&newValue)
: "memory", "cc");
return result;
}
The compiler error is as below.
_party/js-1.7 -Isrc/third_party/js-1.7 src/mongo/bson/oid.cpp
src/mongo/platform/atomic_intrinsics_gcc.h: In member function 'void mongo::OID::initSequential()':
src/mongo/platform/atomic_intrinsics_gcc.h:123:44: error: impossible constraint in 'asm'
src/mongo/platform/atomic_intrinsics_gcc.h:123:44: error: impossible constraint in 'asm'
scons: *** [build/linux2/cc_gcc/cxx_toolchain-c++/mongo/bson/oid.o] Error 1
scons: building terminated because of errors.
The complained line 123:44 is end of the the line before : "memory", "cc");
Was also looked at the other parts of the code, which compiled asm code, was also looks similar. do not know what happened with this one.
Please advice what's wrong with this.
Try using the __sync_val_compare_and_swap GCC intrinsic here.
Due to the F00F bug the lock cmpxchg8b is invalid. I guess you're using something like i586-linux-gcc toolchain and thus you're getting right into this Pentium's problem.
More workarounds may follow if you tell us the exact hardware for you custom linux kernel.
I am trying to write some inline assembly into C. I have two arrays as input, what I need is to copy one element in array1 into array2, and the following is what I have at the moment:
asm (
"movl %0,%%eax;"
"movl %1,%%ebx;"
"movl (%%eax),%%ecx;"
"movl %%ecx,(%ebx);"
"xor %%ecx,%%ecx;"
"movl 4(%%eax),%%ecx;"
//do something on %ecx
"movl %%ecx,4(%ebx);" //write second
:
:"a"(array1),"b"(array2)
);
Why do I get a segmentation fault?
Your inline assembler code is broken. You can't directly use EAX and EBX without adding them to the clobber list. Otherwise the compiler does not know which registers have been modified.
It is very likely that one of the registers that you've modified contained something damn important that later caused the segmentation fault.
This code will copy one element from array1 to array2:
asm (
"movl (%0), %%eax \n\t" /* read first dword from array1 into eax */
"movl %%eax, (%1) \n\t" /* write dword into array2
: /* outputs */
: /* inputs */ "r"(array1),"r"(array2)
: /* clobber */ "eax", "memory"
);
A better version with proper register constraints would drop the hard coded EAX like this:
int dummy;
asm (
"movl (%1), %0 \n\t"
"movl %0, (%2) \n\t"
: /* outputs, temps.. */ "=r" (dummy)
: /* inputs */ "r"(array1),"r"(array2)
: /* clobber */ "memory"
);
Btw - In general I have the feeling that you're not that familiar with assembler yet. Writing inline-assembler is a bit harder to get right due to all the compiler magic. I suggest that you start writing some simple functions in assembler and put them into a separate .S file first.. That's much easier..
Your best option is C code:
target_array[target_idx] = source_array[source_idx];
This avoids segmentation faults as long as the indexes are under control.
what about memcpy ?
We recently upgraded the code to gcc4.3.3 from gcc4.2.4.
void testAndSet( volatile int* s, int* val )
{
__asm__ __volatile__ ( "btsl $0, %0\n "
"jc bitSet\n "
"movl $0, %1\n "
"jmp returnVector\n"
"bitSet:\n "
"movl $1, %1\n"
"returnVector: " : "=m"(*s), "=m"(*val) );
}
Our code now fails with the following errors,
lock.cxx: Assembler messages:
lock.cxx:59: Error: symbol `bitSet' is already defined
lock.cxx:61: Error: symbol `returnVector' is already defined
lock.cxx:59: Error: symbol `bitSet' is already defined
lock.cxx:61: Error: symbol `returnVector' is already defined
Those symbols weren't found anywhere else. (Renaming them causes the same error with the new name).
What's up with this? why do I get the errors twice?
Probably the optimizer has changed and is now inlining your testAndSet() function into 2 places. Because you are using global names for your labels, this does not work. You should use local names instead. E.g:
__asm__ __volatile__ ( "btsl $0, %0\n "
"jc 0f\n "
"movl $0, %1\n "
"jmp 1f\n"
"0:\n "
"movl $1, %1\n"
"1: " : "=m"(*s), "=m"(*val) );
Local labels are just numbers; to disambiguate cases where there are many labels called "0" you need to use "jmp 0f" for forward jumps and "jmp 0b" for backward jumps.
This is unrelated to your error, but you could improve your code and avoid branches simply using the setCC instruction:
__asm__ __volatile__ ( "btsl $0, %0\n "
"mov $0, %1\n"
"setc %1\n" : "=m"(*s), "=m"(*val) );
The setCC instruction (where CC is one of the condition code flags, analogous to the jCC instruction) sets a byte to 0 or 1 depending on whether or not the given condition was satisfied. Since the destination is a 4-byte value, you need to either preload it with 0 or use the MOVZX instruction to make the upper 3 bytes 0.
Also you may use local label names by adding %= after each local label:
"loop%=:" "\n\t"