SSE2 instruction in C code - c

I am trying to reverse engineer a c code, but this part of assembly I cant really understand. I know it is part of the SSE extension. However, somethings are really different than what I am used to in x86 instructions.
static int sad16_sse2(void *v, uint8_t *blk2, uint8_t *blk1, int stride, int h)
{
int ret;
__asm__ volatile(
"pxor %%xmm6, %%xmm6 \n\t"
ASMALIGN(4)
"1: \n\t"
"movdqu (%1), %%xmm0 \n\t"
"movdqu (%1, %3), %%xmm1 \n\t"
"psadbw (%2), %%xmm0 \n\t"
"psadbw (%2, %3), %%xmm1 \n\t"
"paddw %%xmm0, %%xmm6 \n\t"
"paddw %%xmm1, %%xmm6 \n\t"
"lea (%1,%3,2), %1 \n\t"
"lea (%2,%3,2), %2 \n\t"
"sub $2, %0 \n\t"
" jg 1b \n\t"
: "+r" (h), "+r" (blk1), "+r" (blk2)
: "r" ((x86_reg)stride)
);
__asm__ volatile(
"movhlps %%xmm6, %%xmm0 \n\t"
"paddw %%xmm0, %%xmm6 \n\t"
"movd %%xmm6, %0 \n\t"
: "=r"(ret)
);
return ret;
}
What are the %1, %2, and %3? what does (%1,%2,%3) mean? Also what does "+r", "-r", "=r" mean?

You'll want to have a look at this GCC Inline Asssembly HOWTO.
The percent sign numbers are the instruction operands.

The inline assembler works similar to a macro preprocessor. Operands with exactly one leading percent are replaced by the the input parameters in the order as they appear in the parameter list, in this case:
%0 h output, register, r/w
%1 blk1 output, register, r/w
%2 blk2 output, register, r/w
%3 (x86_reg)stride input, register, read only
The parameters are normal C expressions. They can be further specified by "constraints", in this case "r" means the value should be in a register, opposed to "m" which is a memory operand. The constraint modifier "=r" makes this a write-only operand, "+r" is a read-write operand and "r" and normal read operand.
After the first colon the output operands appear, after the second the input operands and after the optional third the clobbered registers.
So the instruction sequence calculates the sum of the absolute differences in each byte of blk1 and blk2. This happens in 16 byte blocks, so if stride is 16, the blocks are contiguous, otherwise there are holes. Each instruction appears twice because some minimal loop unrolling is done, the h parameter is the number of 32 byte blocks to process. The second asm block seems to be useless, as the psadbw instruction sums up only in the low 16 bit of the destination register. (Did you omit some code?)

Related

ARM inline assembly, registers are read in an incorrect order

I am trying to read ARM registers via C using inline assembly but it doesn't follow the order of execution as it should.
volatile uint32_t var1 = 0;
volatile uint32_t var2 = 0;
volatile uint32_t var3 = 0;
volatile uint32_t var4 = 0;
__asm volatile(
"mov r0, #5\n\t"
"mov r1, #6\n\t"
"mov r2, #7\n\t"
"mov r3, #8\n\t"
"mov %0, r0\n\t"
"mov %0, r1\n\t"
"mov %0, r2\n\t"
"mov %0, r3\n\t"
: "=r" (var1),
"=r" (var2),
"=r" (var3),
"=r" (var4));
What happens is that the output is;
var1 = 8
var2 = 5
var3 = 6
var4 = 7
What I expect is;
var1 = 5
var2 = 6
var3 = 7
var4 = 8
It seems r0 is read last and it starts with r1. Why is that happening?
Note: Ignore the purpose of the code or how variables are defined, it's a copy/paste from a bigger application. It's just this specific register behavior in question.
GCC-style assembly inserts are designed to insert a single instruction per asm("..."), and they require you to give accurate information about all of the registers involved. In this case, you have not notified the compiler that registers r0, r1, r2, and r3 are used internally, so it probably thinks it's OK to reuse some of those for var1 through var4. You have also reused %0 as the destination of all four of the final mov instructions, so all four of them are actually writing to var1.
Also, it may or may not be your immediate problem, but volatile doesn't do what you think it does and probably isn't accomplishing anything useful here.
How to fix it? Well, first off, there needs to be a really strong reason why you can't just write
uint32_t var1 = 5;
uint32_t var2 = 6;
uint32_t var3 = 7;
uint32_t var4 = 8;
Assuming there is such a reason, then you should instead try writing one instruction per asm, and not using any scratch registers at all...
asm ("mov %0, #5" : "=r" (var1));
asm ("mov %0, #6" : "=r" (var2));
asm ("mov %0, #7" : "=r" (var3));
asm ("mov %0, #8" : "=r" (var4));
If you really absolutely have to do a whole bunch of work in a single asm then the first thing you should consider is putting it in a separate .S file and making it conform to the ABI so that you can call it like a normal function:
.text
.globl do_the_thing
.type do_the_thing, #function
_do_the_thing:
; all your actual code here
bx r14
.size do_the_thing, .-do_the_thing
and then in your C
rv = do_the_thing(arg1, arg2, ...);
If there's no way to make that work, then, and only then, should you sit down and read the entire "Extended Asm" chapter of the GCC manual and work out how to wedge the complete register-usage behavior of your insert into the constraints. If you need help with that, post a new question in which you show the real assembly language construct you need to insert, rather than a vague example, because every little detail matters.
The arguments passed into the inline assembly need to be incremented;
"mov %0, r0\n\t"
"mov %1, r1\n\t"
"mov %2, r2\n\t"
"mov %3, r3\n\t"

Inline assembly in C not working properly

I'm trying to learn how to use inline assembly in C code.
I have created a small program that should add two integers:
int main(){
int a=1;
int b=2;
asm( "movl %0, %%r8d;"
"movl %1, %%r9d;"
"addl %%r8d, %%r9d;"
"movl %%r9d, %1;"
: "=r" (a)
: "r" (b)
:"%r8","%r9" );
printf("a=%d\n",a);
return 0;
}
The aim was to load a and b into the registers %r8 and %r9, add them, and then put the output back in a.
However this program prints a=2 instead a=3. I'm not sure if the problem is in the inline technique or in the assembly itself.
There are two issues here:
First: The "=r" constraint you use for the output operand a indicates to the compiler that the operand is write-only — it is allowed to assume that the initial value is not needed. This is definitely not the case for your code! Change the qualifier to "+r" to let the compiler know that the initial value is important.
Second: You are moving the result to the wrong register! The target %1 of the last movl is the register corresponding to b, not a. You want %0.
Fixed:
asm(
"movl %0, %%r8d;"
"movl %1, %%r9d;"
"addl %%r8d, %%r9d;"
"movl %%r9d, %0;"
: "+r" (a)
: "r" (b)
: "%r8", "%r9"
);

gcc inline assembly error "operand type mismatch for mov"

//quick inline asm statements performing the swap_byte for key_scheduling
inline void swap_byte(unsigned char *x, unsigned char *y)
{
unsigned char t;
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(t)
:"r"(*x)
:"%eax");
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(*x)
:"r"(*y)
:"%eax");
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(*y)
:"r"(t)
:"%eax");
}
Here I am trying to swap the char from x and store in y, and the same for y to x.
I have compiled these instructions by changing movl to mov but with no success. Where is the problem in compiling/linking?
Here is the output from compiling in cygwin:
$ gcc rc4_main.c -o rc4ex
/tmp/ccy0wo6H.s: Assembler messages:
/tmp/ccy0wo6H.s:18: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:18: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:26: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:26: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:34: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:34: Error: operand type mismatch for `mov'
To simplify it even more (than user35443):
asm("" : "=r" (*x), "=r" (*y) : "1" (*x), "0" (*y));
Look ma! No code! And yes, this really works.
To explain how this works:
When the compiler is building the code, it keeps track of what value is in each register. So if had these for inputs to asm:
"r" (*x), "r" (*y)
The compiler will pick a register and put *x in it, then pick a register and put *y in it, then call your asm. But it also keeps track of what variable is in which register. If there were just some way to tell the compiler that all it had to do was start treating the two registers as the opposite variables, then we'd be set. And that's what this code does:
Saying "=r" (*x) means that we are going to be overwriting the value in *x, that that we will be putting the value into a register.
Saying "0" (*y) means that on input to the asm, the compiler must put the value of *y into the same register as is being used by output parameter #0.
So, without using any actually assembly instructions, we have told the compiler to swap these two values.
We don't get this quite "for free" since the compiler must load the values into registers before calling the asm. But since that has to happen anyway...
What about actually updating memory? The compiler will (if necessary) write these values from the registers back to memory. And since it knows what variable is in which register, all works as expected.
unsigned char t;
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(t) /* <--here */
:"r"(*x) /* <-- and here */
:"%eax");
You can not move a value from a 32-bit register to a single-byte memory location. t is on the stack and x is somewhere else, but both are accessed in the same way. Problems on the other lines are similar. You should move only a byte.
Try something like this, but there are more ways to do it (I haven't tried that, read below):
unsigned char t;
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(t)
:"r"(*x)
:"%al");
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(*x)
:"r"(*y)
:"%al");
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(*y)
:"r"(t)
:"%al");
The whole procedure can be simplified into this:
asm("movb (%1), %%al\n"
"movb (%2), %%ah\n"
"movb %%ah, (%1)\n"
"movb %%al, (%2)\n"
: /* no outputs for compiler to know about */
: "r" (x), "r" (y)
: "%ax", "memory");
line
movl %%eax, %0;
is full nonsence! So you try to change 0 constant by %eax register It's impossible. In fortran many years ago It was. After that all programs will behave quite unpredictable. Since to avoid that was introduce the rule that any identificatot can't begin with number. But You try do it. It's well to get error. You maybe mean another
movl %0, %%eax;
to set zerro to eax. So better do another code
xorl %%eax, %%eax;
is much better!

What does a double-percent sign (%%) do in gcc inline assembly?

I came across a code that looks like this:
asm volatile (
# [...]
"movl $1200, %%ecx;"
# [...]
);
I know what movl $1200, %ecx does in x86. but I was confused by why there are two percent signs.
GCC inline assembly uses %0, %1, %2, etc. to refer to input and output operands. That means you need to use two %% for real registers.
Check this howto for great information.
It depends
if there is a colon : after the string, then it is an extended asm, and %% escapes the percent which could have especial meanings as mentioned by Carl. Example:
uint32_t in = 1;
uint32_t out = 0;
asm volatile (
"movl %1, %%eax;"
"inc %%eax;"
"movl %%eax, %0"
: "=m" (out) /* Outputs. '=' means written to. */
: "m" (in) /* Inputs. No '='. */
: "%eax"
);
assert(out == in + 1);
otherwise, it will be a compile time error, because without colon it is a basic asm which does not support variable constraints and does not need or support escaping %1. E.g.:
asm volatile ("movl $1200, %ecx;");
works just fine.
Extended asm is more often used since it is much more powerful.
This helps GCC to distinguish between the operands and registers. operands have a single % as prefix. '%%' is always used with registers.

C inline assembly memory copy

I am trying to write some inline assembly into C. I have two arrays as input, what I need is to copy one element in array1 into array2, and the following is what I have at the moment:
asm (
"movl %0,%%eax;"
"movl %1,%%ebx;"
"movl (%%eax),%%ecx;"
"movl %%ecx,(%ebx);"
"xor %%ecx,%%ecx;"
"movl 4(%%eax),%%ecx;"
//do something on %ecx
"movl %%ecx,4(%ebx);" //write second
:
:"a"(array1),"b"(array2)
);
Why do I get a segmentation fault?
Your inline assembler code is broken. You can't directly use EAX and EBX without adding them to the clobber list. Otherwise the compiler does not know which registers have been modified.
It is very likely that one of the registers that you've modified contained something damn important that later caused the segmentation fault.
This code will copy one element from array1 to array2:
asm (
"movl (%0), %%eax \n\t" /* read first dword from array1 into eax */
"movl %%eax, (%1) \n\t" /* write dword into array2
: /* outputs */
: /* inputs */ "r"(array1),"r"(array2)
: /* clobber */ "eax", "memory"
);
A better version with proper register constraints would drop the hard coded EAX like this:
int dummy;
asm (
"movl (%1), %0 \n\t"
"movl %0, (%2) \n\t"
: /* outputs, temps.. */ "=r" (dummy)
: /* inputs */ "r"(array1),"r"(array2)
: /* clobber */ "memory"
);
Btw - In general I have the feeling that you're not that familiar with assembler yet. Writing inline-assembler is a bit harder to get right due to all the compiler magic. I suggest that you start writing some simple functions in assembler and put them into a separate .S file first.. That's much easier..
Your best option is C code:
target_array[target_idx] = source_array[source_idx];
This avoids segmentation faults as long as the indexes are under control.
what about memcpy ?

Resources