Addition function in Windows X86 via inline asm lines in C code - c

Can someone explain what I'm doing wrong here:
int MachineAdder(int a, int b)
{
int OUT = 0; /* Assign a pointer (&OUT) and write initial data (0) */
__asm ("mov %[dst], %[src]" /* Machine instruction to execute, separated by commas.*/
: [dst] "=r" (OUT)
: [src] "r" (a)
);
__asm ("add %[dst], %[src]" /* Machine instruction to execute, separated by commas.*/
: [dst] "=r" (OUT)
: [src] "r" (b)
);
return OUT; /* Return the value a+b */
}
In my main() function, I call:
printf("0+0 = %d\n", MachineAdder(0,0));
printf("0+1 = %d\n", MachineAdder(0,1));
printf("1+0 = %d\n", MachineAdder(1,0));
printf("1+1 = %d\n", MachineAdder(1,1));
printf("2+1 = %d\n", MachineAdder(2,1));
printf("1+2 = %d\n", MachineAdder(1,2));
In my output, it reads "0 2 0 2 2 4" (whereas I'd expect "0 1 1 2 3 3").
Thanks! Googling answers was messy because some machine instructions seem to be back to front, while others talk about registers but I don't know which register is which or how to use them.
EDIT: Working solution found. There were two errors: src and dst were the wrong way around, and I had never heard of the "+r" string, used for inout parameters. Here's the fixed version:
int MachineAdder(int a, int b)
{
int OUT = 0; /* Assign a pointer (&OUT) and write initial data (0) */
__asm ("mov %[src], %[dst]" /* Machine instruction to execute, separated by commas.*/
: [dst] "=r" (OUT)
: [src] "r" (a)
);
__asm ("add %[src], %[dst]" /* Machine instruction to execute, separated by commas.*/
: [dst] "+r" (OUT)
: [src] "r" (b)
);
return OUT; /* Return the value a+b */
}
Thanks all!

This is because, in the output operands, = mark doesn't guarantee that the location has the existing value while + mark does.
Extended Asm (Using the GNU Compiler Collection (GCC)) says:
Output constraints must begin with either ‘=’ (a variable overwriting an existing value) or ‘+’ (when reading and writing). When using ‘=’, do not assume the location contains the existing value on entry to the asm, except when the operand is tied to an input

Related

aarch64 Inline assembly error : operand 2 must be an integer register -- `ldnp x0,[x0]'

I'm trying to write a simple function using in-line assembly and use it in a C program
The mem_io_read is a function that reads a memory address bypassing cache (event though the address is located in a cacheable memory region). It's for aarch64 machine.
static inline int mem_io_read(unsigned long paddr)
{
unsigned long val;
register pa;
__asm__ __volatile__("mov %0, %1\n\t" : "=r" (pa) : "r"(paddr)); <-- move paddr to a register pa
__asm__ __volatile__("ldnp %0, [%1]\n\t" : "=r" (val) : "r" (pa)); <-- load data from addr in pa
return val;
}
main()
{
...
uint32_t SCP_WR_ADDR = &scp_wait; // where test1val was located. //x06000000;
uint32_t chk_scp_rd_data = 0;
// Send flag for proceeding SCP test
(*(volatile uint32_t *)(SCP_WR_ADDR)) = 0x87654321; <-- send signal to the other processor (scp)
// Receives flag from SCP
while(chk_scp_rd_data != 0x12345678) <--- read back until the value is changed (reverse order)
{
chk_scp_rd_data = mem_io_read(SCP_WR_ADDR);
}
}
When I compile this using gcc, I get this error
/tmp/ccCpQGc5.s: Assembler messages:
/tmp/ccCpQGc5.s:26: Error: operand 2 must be an integer register -- `ldnp x0,[x0]'
I can't figure out what is wrong here. Please help.
ADD : from Peter Cordes's comment, I changed it to this one. It is compiled ok.
static int inline mem_io_read(unsigned long paddr)
{
int val, val1;
__asm__ __volatile__("ldnp %0, %1, [%2]\n\t" : "=r" (val), "=r" (val1) : "r" (paddr) : "memory");
return val;
}

Set register using a variable inline assembly

My requirement is to set EDI register using a variable with inline assembly. I wrote the following snippet but it fails to compile.
uint32_t value = 0;
__asm__ __volatile__("mov %1,%%edi \n\t"
: "=D"
: "ir" (value)
:
);
Errors I get are
cyg_functions.cpp(544): error: expected a "("
: "ir" (value)
^
cyg_functions.cpp(544): internal error: null pointer
: "ir" (value)
Edit
I guess I wasn't clear on the problem specification. Let's say my requirement is as follows.
There are two int variables val and result.
I need to
Set the value of variable val to %%edi clobbering whatever in there already
Multiply %%edi value by 2
Set %%edi value back to result variable
How can this be stated with inline assembly? Though this is not exactly my requirement answer to this (specifically the 1st step) would solve my problem. I need the intermediate to be specifically in EDI register.
I have read your comments, and the requirements here still makes no sense to me. However, making sense is not a requirement. Such being the case:
int main(int argc, char *argv[])
{
int res;
int value = argc;
asm ("shl $1, %[res]" /* Take the value in res (aka EDI) and shift
it left by 1. */
: [res] "=D" (res) /* On exit from the asm, the register EDI
will contain the value for "res". The
existing value of res will be overwritten. */
: "0" (value)); /* Take the contents of "value" and put it
in the same place as parameter #0. */
return res;
}
This may be easier to understand if you read it from the bottom up.

Errors using inline assembly in C

I'm trying my hand at assembly in order to use vector operations, which I've never really used before, and I'm admittedly having a bit of trouble grasping some of the syntax.
The relevant code is below.
unit16_t asdf[4];
asdf[0] = 1;
asdf[1] = 2;
asdf[2] = 3;
asdf[3] = 4;
uint16_t other = 3;
__asm__("movq %0, %%mm0"
:
: "m" (asdf));
__asm__("pcmpeqw %0, %%mm0"
:
: "r" (other));
__asm__("movq %%mm0, %0" : "=m" (asdf));
printf("%u %u %u %u\n", asdf[0], asdf[1], asdf[2], asdf[3]);
In this simple example, I'm trying to do a 16-bit compare of "3" to each element in the array. I would hope that the output would be "0 0 65535 0". But it won't even assemble.
The first assembly instruction gives me the following error:
error: memory input 0 is not directly addressable
The second instruction gives me a different error:
Error: suffix or operands invalid for `pcmpeqw'
Any help would be appreciated.
You can't use registers directly in gcc asm statements and expect them to match up with anything in other asm statements -- the optimizer moves things around. Instead, you need to declare variables of the appropriate type and use constraints to force those variables into the right kind of register for the instruction(s) you are using.
The relevant constraints for MMX/SSE are x for xmm registers and y for mmx registers. For your example, you can do:
#include <stdint.h>
#include <stdio.h>
typedef union xmmreg {
uint8_t b[16];
uint16_t w[8];
uint32_t d[4];
uint64_t q[2];
} xmmreg;
int main() {
xmmreg v1, v2;
v1.w[0] = 1;
v1.w[1] = 2;
v1.w[2] = 3;
v1.w[3] = 4;
v2.w[0] = v2.w[1] = v2.w[2] = v2.w[3] = 3;
asm("pcmpeqw %1,%0" : "+x"(v1) : "x"(v2));
printf("%u %u %u %u\n", v1.w[0], v1.w[1], v1.w[2], v1.w[3]);
}
Note that you need to explicitly replicate the 3 across all the relevant elements of the second vector.
From intel reference manual:
PCMPEQW mm, mm/m64 Compare packed words in mm/m64 and mm for equality.
PCMPEQW xmm1, xmm2/m128 Compare packed words in xmm2/m128 and xmm1 for equality.
Your pcmpeqw uses an "r" register which is wrong. Only "mm" and "m64" registers
valter
The code above failed when expanding the asm(), it never tried to even assemble anything. In this case, you are trying to use the zeroth argument (%0), but you didn't give any.
Check out the GCC Inline assembler HOWTO, or read the relevant chapter of your local GCC documentation.
He's right, the optimizer is changing register contents. Switching to intrinsics and using volatile to keep things a little more in place might help.

Why CompareAndSwap is more of a powerful instruction than TestAndSet?

Please consider the following piece of code for CompareAndSwap and let me know why this atomic instruction is more powerful than atomic TestAndSet for being a mutual exclusion primitive?
char CompareAndSwap(int *ptr, int old, int new) {
unsigned char ret;
// Note that sete sets a ’byte’ not the word
__asm__ __volatile__ (
" lock\n"
" cmpxchgl %2,%1\n"
" sete %0\n"
: "=q" (ret), "=m" (*ptr)
: "r" (new), "m" (*ptr), "a" (old)
: "memory");
return ret;
}
test-and-set modifies the contents of a memory location and returns its old value as a single atomic operation.
compare-and-swap atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value.

inline assembler for calling a system call and retrieve its result

I want to call a system call (prctl) in assembly inline and retrieve the result of the system call. But I cannot make it work.
This is the code I am using:
int install_filter(void)
{
long int res =-1;
void *prg_ptr = NULL;
struct sock_filter filter[] = {
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP),
/* If a trap is not generate, the application is killed */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
};
struct sock_fprog prog = {
.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
.filter = filter,
};
prg_ptr = &prog;
no_permis();
__asm__ (
"mov %1, %%rdx\n"
"mov $0x2, %%rsi \n"
"mov $0x16, %%rdi \n"
"mov $0x9d, %%rax\n"
"syscall\n"
"mov %%rax, %0\n"
: "=r"(res)
: "r"(prg_ptr)
: "%rdx", "%rsi", "%rdi", "%rax"
);
if ( res < 0 ){
perror("prctl");
exit(EXIT_FAILURE);
}
return 0;
}
The address of the filter should be the input (prg_ptr) and I want to save the result in res.
Can you help me?
For inline assembly, you don't use movs like this unless you have to, and even then you have to do ugly shiffling. That's because you have no idea what registers arguments arrive in. Instead, you should use:
__asm__ __volatile__ ("syscall" : "=a"(res) : "d"(prg_ptr), "S"(0x2), "D"(0x16), "a"(0x9d) : "memory");
I also added __volatile__, which you should use for any asm with side-effects other than its output, and a memory clobber (memory barrier), which you should use for any asm with side-effects on memory or for which reordering it with respect to memory accesses would be invalid. It's good practice to always use both of these for syscalls unless you know you don't need them.
If you're still having problems, use strace to observe the syscall attempt and see what's going wrong.

Resources