inline assembler for calling a system call and retrieve its result - c

I want to call a system call (prctl) in assembly inline and retrieve the result of the system call. But I cannot make it work.
This is the code I am using:
int install_filter(void)
{
long int res =-1;
void *prg_ptr = NULL;
struct sock_filter filter[] = {
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP),
/* If a trap is not generate, the application is killed */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
};
struct sock_fprog prog = {
.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
.filter = filter,
};
prg_ptr = &prog;
no_permis();
__asm__ (
"mov %1, %%rdx\n"
"mov $0x2, %%rsi \n"
"mov $0x16, %%rdi \n"
"mov $0x9d, %%rax\n"
"syscall\n"
"mov %%rax, %0\n"
: "=r"(res)
: "r"(prg_ptr)
: "%rdx", "%rsi", "%rdi", "%rax"
);
if ( res < 0 ){
perror("prctl");
exit(EXIT_FAILURE);
}
return 0;
}
The address of the filter should be the input (prg_ptr) and I want to save the result in res.
Can you help me?

For inline assembly, you don't use movs like this unless you have to, and even then you have to do ugly shiffling. That's because you have no idea what registers arguments arrive in. Instead, you should use:
__asm__ __volatile__ ("syscall" : "=a"(res) : "d"(prg_ptr), "S"(0x2), "D"(0x16), "a"(0x9d) : "memory");
I also added __volatile__, which you should use for any asm with side-effects other than its output, and a memory clobber (memory barrier), which you should use for any asm with side-effects on memory or for which reordering it with respect to memory accesses would be invalid. It's good practice to always use both of these for syscalls unless you know you don't need them.
If you're still having problems, use strace to observe the syscall attempt and see what's going wrong.

Related

Moving data into __uint24 with assembly

I originally had the following C code:
volatile register uint16_t counter asm("r12");
__uint24 getCounter() {
__uint24 res = counter;
res = (res << 8) | TCNT0;
return res;
}
This function runs in some hot places and is inlined, and I'm trying to cram a lot of stuff into an ATtiny13, so it came time to optimize it.
That function compiles to:
getCounter:
movw r24,r12
ldi r26,0
clr r22
mov r23,r24
mov r24,r25
in r25,0x32
or r22,r25
ret
I came up with this assembly:
inline __uint24 getCounter() {
//__uint24 res = counter;
//res = (res << 8) | TCNT0;
uint32_t result;
asm(
"in %A[result],0x32" "\n\t"
"movw %C[result],%[counter]" "\n\t"
"mov %B[result],%C[result]" "\n\t"
"mov %C[result],%D[result]" "\n\t"
: [result] "=r" (result)
: [counter] "r" (counter)
:
);
return (__uint24) result;
}
The reason for uint32_t is to "allocate" the fourth consecutive register and for the compiler to understand it is clobbered (since I cannot do something like "%D[result]" in the clobber list)
Is my assembly correct? From my testing it seems like it is.
Is there a way to allow the compiler to optimize getCounter() better so there's not need for confusing assembly?
Is there a better way to do this in assembly?

aarch64 Inline assembly error : operand 2 must be an integer register -- `ldnp x0,[x0]'

I'm trying to write a simple function using in-line assembly and use it in a C program
The mem_io_read is a function that reads a memory address bypassing cache (event though the address is located in a cacheable memory region). It's for aarch64 machine.
static inline int mem_io_read(unsigned long paddr)
{
unsigned long val;
register pa;
__asm__ __volatile__("mov %0, %1\n\t" : "=r" (pa) : "r"(paddr)); <-- move paddr to a register pa
__asm__ __volatile__("ldnp %0, [%1]\n\t" : "=r" (val) : "r" (pa)); <-- load data from addr in pa
return val;
}
main()
{
...
uint32_t SCP_WR_ADDR = &scp_wait; // where test1val was located. //x06000000;
uint32_t chk_scp_rd_data = 0;
// Send flag for proceeding SCP test
(*(volatile uint32_t *)(SCP_WR_ADDR)) = 0x87654321; <-- send signal to the other processor (scp)
// Receives flag from SCP
while(chk_scp_rd_data != 0x12345678) <--- read back until the value is changed (reverse order)
{
chk_scp_rd_data = mem_io_read(SCP_WR_ADDR);
}
}
When I compile this using gcc, I get this error
/tmp/ccCpQGc5.s: Assembler messages:
/tmp/ccCpQGc5.s:26: Error: operand 2 must be an integer register -- `ldnp x0,[x0]'
I can't figure out what is wrong here. Please help.
ADD : from Peter Cordes's comment, I changed it to this one. It is compiled ok.
static int inline mem_io_read(unsigned long paddr)
{
int val, val1;
__asm__ __volatile__("ldnp %0, %1, [%2]\n\t" : "=r" (val), "=r" (val1) : "r" (paddr) : "memory");
return val;
}

Is there a race condition in the linux ARM spinlock?

Here is the Linux implementation of a spinlock from arch/arm/include/asm/spinlock.h:
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned long tmp;
u32 newval;
arch_spinlock_t lockval;
prefetchw(&lock->slock);
__asm__ __volatile__(
"1: ldrex %0, [%3]\n"
" add %1, %0, %4\n"
" strex %2, %1, [%3]\n"
" teq %2, #0\n"
" bne 1b"
: "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
: "cc");
while (lockval.tickets.next != lockval.tickets.owner) {
wfe();
lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
}
smp_mb();
}
...
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
smp_mb();
lock->tickets.owner++;
dsb_sev();
}
My concern is that the following two lines in arch_spin_lock:
while (lockval.tickets.next != lockval.tickets.owner) {
wfe();
are not atomic. So what if arch_spin_unlock was called in between these two lines? This means in the function arch_spin_lock the WFE instruction would be run but the SEV has already been run and won't be run again. So at the very worst arch_spin_lock would wait forever, or until some unrelated event occurs.
Is this correct, or am I misunderstanding something? If it is a problem even only in theory, is there a way to avoid the problem?
I think you are missing this bit of WFE documentation:
If the Event Register is set, WFE clears it and returns immediately.
In the "race" you describe WFE will get executed, but will return immediately, then while loop will exit.

Is it correct to perform a function call like this?

I have an array with 32bit values (nativeParameters with length nativeParameterCount) and a pointer to the function (void* to a cdecl function, here method->nativeFunction) thats supposed to be called. Now I'm trying to do this:
// Push parameters for call
if (nativeParameterCount != 0) {
uint32_t count = 0;
pushParameter:
uint32_t value = nativeParameters[nativeParameterCount - count - 1];
asm("push %0" : : "r"(value));
if (++count < nativeParameterCount) goto pushParameter;
}
// Call method
asm("call *%0" : : "r"(method->nativeFunction));
// Return value
uint32_t eax;
uint32_t edx;
asm("push %eax");
asm("push %edx");
asm("pop %0" : "=r"(edx));
asm("pop %0" : "=r"(eax));
uint64_t returnValue = eax;
// If the typesize of the methods return type is >4 bytes, or with EDX
Type returnType = method->returnType.type;
if (TYPE_SIZES[returnType] > 4) {
returnValue |= (((uint64_t) edx) << 32);
}
// Clean stack
asm("add %%esp, %0" : : "r"(parameterByteSize));
Is this approach suitable to perform a native call (assuming that all target functions accept only 32bit values as parameters)? Can I be sure that it doesn't destroy the stack or mess with registers, or somehow else influence the normal flow? Also, are there other ways of doing this?
Instead of doing this manually yourself, you might want to use the dyncall libary which does all this handling for you.

Implementing a Mutex Lock in C

I'm trying to make a really simple spinlock mutex in C and for some reason I'm getting cases where two threads are getting the lock at the same time, which shouldn't be possible. It's running on a multiprocessor system which may be why there's a problem. Any ideas why it's not working?
void mutexLock(mutex_t *mutexlock, pid_t owner)
{
int failure = 1;
while(mutexlock->mx_state == 0 || failure || mutexlock->mx_owner != owner)
{
failure = 1;
if (mutexlock->mx_state == 0)
{
asm(
"movl $0x01,%%eax\n\t" // move 1 to eax
"xchg %%eax,%0\n\t" // try to set the lock bit
"mov %%eax,%1\n\t" // export our result to a test var
:"=r"(mutexlock->mx_state),"=r"(failure)
:"r"(mutexlock->mx_state)
:"%eax"
);
}
if (failure == 0)
{
mutexlock->mx_owner = owner; //test to see if we got the lock bit
}
}
}
Well for a start you're testing an uninitialised variable (failure) the first time the while() condition is executed.
Your actual problem is that you're telling gcc to use a register for mx_state - which clearly won't work for a spinlock. Try:
asm volatile (
"movl $0x01,%%eax\n\t" // move 1 to eax
"xchg %%eax,%0\n\t" // try to set the lock bit
"mov %%eax,%1\n\t" // export our result to a test var
:"=m"(mutexlock->mx_state),"=r"(failure)
:"m"(mutexlock->mx_state)
:"%eax"
);
Note that asm volatile is also important here, to ensure that it doesn't get hoisted out of your while loop.
The problem is that you load mx_state into a register (the 'r' constraint) and then do the exchange with the registers, only writing back the result into mx_state at the end of the asm code. What you want is something more like
asm(
"movl $0x01,%%eax\n\t" // move 1 to eax
"xchg %%eax,%1\n\t" // try to set the lock bit
"mov %%eax,%0\n\t" // export our result to a test var
:"=r"(failure)
:"m" (mutexlock->mx_state)
:"%eax"
);
Even this is somewhat dangerous, as in theory the compiler could load the mx_state, spill it into a local temp stack slot, and do the xchg there. It also is somewhat inefficient, as it has extra movs hardcoded that may not be needed but can't be eliminated by the optimizer. You're better off using a simpler asm that expands to a single instruction, such as
failure = 1;
asm("xchg %0,0(%1)" : "=r" (failure) : "r" (&mutex->mx_state), "0" (failure));
Note how we force the use of mx_state in place, by using it's address rather than its value.

Resources