inline void addition(double * x, const double * vx,uint32_t size){
/*for (uint32_t i=0;i<size;++i){
x[i] = x[i] + vx[i];
}*/
__asm__ __volatile__ (
"1: \n\t"
"vmovupd -32(%0), %%ymm1\n\t"
"vmovupd (%0), %%ymm0\n\t"
"vaddpd -32(%1), %%ymm0, %%ymm0\n\t"
"vaddpd (%1), %%ymm1, %%ymm1\n\t"
"vmovupd %%ymm0, -32(%0)\n\t"
"vmovupd %%ymm1, (%0)\n\t"
"addq $128, %0\n\t"
"addq $128, %1\n\t"
"addl $-8, %2\n\t"
"jne 1b"
:
: "r" (x),"r"(vx),"r"(size)
: "ymm0", "ymm1"
);
}
I am practicing assembly(AVX instructions) right now so I write the above piece of code in inline assembly to replace the c code in the original function(which is commented out). The compiling process is successful but when I try to run the program, An error happens: Bus error: 10
Any thoughts to this bug? I didn't know what's wrong here. The compiler version is clang 602.0.53. Thank you!
Inline assembly is a complicated beast, if you just want to practice AVX assembly use a separate asm file where you don't have to put up with the compiler. In exchange, you will need to observe calling convention though.
You have some issues with the constraints. For example, you change all your input registers without telling the compiler and that can cause all sorts of weird problems elsewhere in compiler generated code. You also need to specify a memory clobber for obvious reasons.
Also, learn to use a debugger so you can find the exact cause of problems and fix your own code.
Failing that, at least comment your code so we can figure out your intentions. In this case, I am particularly puzzled why you use -32 offset to address before the array. I think you wanted +32 there. Using two avx registers at 32 bytes each, you of course need to advance the pointers by 64 not 128. Also you have ymm0 and ymm1 swapped in the initial load.
This code seems to work fine for me:
#include <stdio.h>
#include <stdint.h>
inline void addition(double * x, const double * vx,uint32_t size){
/*for (uint32_t i=0;i<size;++i){
x[i] = x[i] + vx[i];
}*/
__asm__ __volatile__ (
"1: \n\t"
"vmovupd 32(%0), %%ymm0\n\t"
"vmovupd (%0), %%ymm1\n\t"
"vaddpd 32(%1), %%ymm0, %%ymm0\n\t"
"vaddpd (%1), %%ymm1, %%ymm1\n\t"
"vmovupd %%ymm0, 32(%0)\n\t"
"vmovupd %%ymm1, (%0)\n\t"
"addq $64, %0\n\t"
"addq $64, %1\n\t"
"addl $-8, %2\n\t"
"jne 1b"
: "+r" (x),"+r"(vx),"+r"(size)
:
: "ymm0", "ymm1", "memory"
);
}
int main()
{
double x[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
double vx[] = { 9, 10, 11, 12, 13, 14, 15, 16 };
int i;
addition(x, vx, 8);
for(i = 0; i < 8; i++) printf("%g ", x[i]);
putchar('\n');
return 0;
}
Related
This question already has answers here:
Referencing memory operands in .intel_syntax GNU C inline assembly
(1 answer)
Calling printf in extended inline ASM
(1 answer)
Is this assembly function call safe/complete?
(2 answers)
Calling a function in gcc inline assembly
(1 answer)
Closed 1 year ago.
I am currently playing around with in-line simply and I've gotten a bit stuck. I have managed to call a function with no parameters but when it comes to calling one with two parameters I get stuck.
My code below should call a function (add) that adds to predefined numbers together and it should call a second one (add parameter) with two parameters which should be added together.
#include <stdio.h>
int c = 4;
int d = 5;
void add() {
int result = 1 + 2;
printf("Result: %d\n", result);
}
void add_parameter(int a, int b) {
int result = a + b;
printf("Result: %d\n", result);
}
int main()
{
__asm__ __volatile__ ( "call add" );
// __asm__ __volatile__(
// "mov eax, offset c"
// "push eax"
// "mov eax, offset d"
// "push eax"
// "call add_parameter"
// "pop ebx"
// "pop ebx"
// );
__asm__ __volatile__ ( "mov eax, offset c" );
__asm__ __volatile__ ( "push eax" );
__asm__ __volatile__ ( "mov eax, offset d" );
__asm__ __volatile__ ( "push eax" );
__asm__ __volatile__ ( "call add_parameter" );
__asm__ __volatile__ ( "pop ebx" );
__asm__ __volatile__ ( "pop ebx" );
return 0;
}
My problem at the moment is that when I try to compile the program I get an error that says
p_function.c:31: Error: too many memory references for `mov'
p_function.c:33: Error: too many memory references for `mov'
In my program I've tried two approaches one being one single ASM call with the whole ASM code in it and one where I had split each line into its own asm call.
Unfortunately I am not sure which one of these approaches is correct let alone the most effective but I get the same error regardless of which approach I use.
How can I fix this problem and call the function add_parameter
Thanks
In this code:
int a[2]={5,2},i=0;
asm volatile
(
"incl %1\n"
"incl %0"
:"+r"(a[i]),"+r"(i)
:
:
);
printf("%d\n",a[i]);
I'm trying to increase a[1] by 1 (for a result of 2+1=3) but the output shows 2, which means it hasn't changed. What's the problem and how can I fix it?
I try to generate 32-bit code like this:
gcc -S -m32 BMPTransformer.c -o BMPTransformer.s
I'm using Ubuntu 13.04. People with similar problems seem to have overcome their difficulties by installing libc6-dev-i386. It hadn't worked for me, though.
The compiler complains:
BMPTransformer.c:243:6: error: can’t find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
BMPTransformer.c:243:6: error: ‘asm’ operand has impossible constraints
Code as is:
216 static void ASM_reverse_image(BMPImage *image)
217 {
218 asm (
219 "movl $0, %%eax\n"
220
221 "cmpl %%eax, %1\n"
222 "jl end\n"
223
224 "row:\n"
225 "movl (%0, %%eax, 4), %%edx\n"
226 "decl %1\n"
227 "movl (%0, %1, 4), %%esi\n"
228 "movl %%esi, (%0,%%eax, 4)\n"
229 "incl %%eax\n"
230 "movl %%edx, (%0, %1, 4)\n"
231 "cmpl %%eax, %1\n"
232 "jg row\n"
233
234 "end:\n"
235
236 : : "r"(image->pixel_data), "r"(image->header.height): "%eax", "%edx", "%esi"
237 );
238 }
The code that used the 64-bits a,b,c registers had worked perfectly. But I need a 32-bit version.
The error usually signals that the compiler has run out of registers. From the small fragment you posted that should not be the case, and indeed it compiles fine for me. You are probably not telling some important detail.
Anyway, there is absolutely no reason to write this in inline asm in its current form. The compiler can easily generate better (and working) code. The initial comparison certainly should be in C.
Side note: when using gcc inline asm the general idea is to leave as many possibilities to the compiler as possible. For example you don't specifically need any of the registers, you could have used generic constraints.
code it as plain C:
static void ASM_reverse_image(BMPImage *image)
{
int *pixel_data = image->pixel_data;
int tmp;
size_t idx = 0, height = image->header.height;
for (idx = 0; idx < height; idx++) {
tmp = pixel_data[idx];
pixel_data[idx] = pixel_data[height - idx];
pixel_data[height - idx] = tmp;
}
}
or, if you're using C++, just:
for (idx = 0; idx < height; idx++)
std::swap(pixel_data[idx], pixel_data[height - idx]);
Edit: For assembly exercise, this would do:
int tmp;
asm("row:
mov (%0), %2
xchg %2, (%0, %1, 4)
lea 4(%0), %0
dec %1
jns row"
: : "r"(image->pixel_data), "r"(image->header.height), "r"(tmp)
: "memory", "cc");
but this isn't good code - largely because this is a "streaming" type of processing and should be done via the vector units.
It's always a good idea in gcc inline assembly to avoid requesting specific registers. Let the compiler choose instead. That might mean you'll have to declare one or more "pseudovariables" as assembly register operands (to get a "reg reservation").
I've a problem using C/C++ variables inside ARM NEON assembly code written in:
__asm__ __volatile()
I've read about the following possibilities, which should move values from ARM to NEON registers. Each of the following possibilities cause a Fatal Signal in my Android application:
VDUP.32 d0, %[variable]
VMOV.32 d0[0], %[variable]
the input argument list includes:
[variable] "r" (variable)
The only way I have success is using a load:
int variable = 0;
int *address = &variable;
....
VLD1.32 d0[0], [%[address]]
: [address] "+r" (address)
But I think a load is not the best for performance if I don't need to modify the variable, and I also need to understand how to move data from ARM to NEON registers for other purposes.
EDIT: added example as requested, both possibility 1 and 2 result in a "fatal signal". I know in this example NEON assembly simply should modify first 2 elements of "array[4]".
int c = 10;
int *array4;
array4 = new int[64];
for(int i = 0; i < 64; i++){
array4[i] = 100*i;
}
__asm__ __volatile ("VLD1.32 d0, [%[array4]] \n\t"
"VMOV.32 d1[0], %[c] \n\t" //this is possibility 1
"VDUP.32 d2, %[c] \n\t" //this is possibility 2
"VMUL.S32 d0, d0, d2 \n\t"
"VST1.32 d0, [%[output_array1]] \n\t"
: [output_array1] "=r" (output_array1)
: [c] "r" (c), [array4] "r" (array4)
: "d0", "d1", "d2");
The problem is caused by the output list. Moving the output array address in an input register solves the crashes.
int c = 10;
int *array4;
array4 = new int[64];
for(int i = 0; i < 64; i++){
array4[i] = 100*i;
}
__asm__ __volatile ("VLD1.32 d0, [%[array4]] \n\t"
"VMOV.32 d1[0], %[c] \n\t" //this is possibility 1
"VDUP.32 d2, %[c] \n\t" //this is possibility 2
"VMUL.S32 d0, d0, d2 \n\t"
"VST1.32 d0, [%[output_array1]] \n\t"
:
: [c] "r" (c), [array4] "r" (array4), [output_array1] "r" (output_array1)
: "d0", "d1", "d2");
I want to assign an array using inline assembly using the AT&T syntax. I want to achieve something like the following. Note that rsp here is the %rsp register.
long saved_sp[N];
long new_sp[N];
void some_function( unsigned int tid, ... )
{
// These two lines should be in assembly
saved_sp[tid] = rsp;
rsp = new_sp[tid];
......
}
I'm sure I don't need to warn you...
__asm__ __volatile__ (
"movq %%rsp, (%0, %2, 8)\n\t"
"movq (%1, %2, 8), %%rsp\n\t"
: : "r" (saved_sp), "r" (new_sp), "r" ((long) tid));
Perhaps "memory" should be added as a clobber, but it seems kind of redundant. Wherever you go after this, remember that the frame pointer "%rbp" will be invalidated.