Why CompareAndSwap is more of a powerful instruction than TestAndSet? - c

Please consider the following piece of code for CompareAndSwap and let me know why this atomic instruction is more powerful than atomic TestAndSet for being a mutual exclusion primitive?
char CompareAndSwap(int *ptr, int old, int new) {
unsigned char ret;
// Note that sete sets a ’byte’ not the word
__asm__ __volatile__ (
" lock\n"
" cmpxchgl %2,%1\n"
" sete %0\n"
: "=q" (ret), "=m" (*ptr)
: "r" (new), "m" (*ptr), "a" (old)
: "memory");
return ret;
}

test-and-set modifies the contents of a memory location and returns its old value as a single atomic operation.
compare-and-swap atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value.

Related

aarch64 Inline assembly error : operand 2 must be an integer register -- `ldnp x0,[x0]'

I'm trying to write a simple function using in-line assembly and use it in a C program
The mem_io_read is a function that reads a memory address bypassing cache (event though the address is located in a cacheable memory region). It's for aarch64 machine.
static inline int mem_io_read(unsigned long paddr)
{
unsigned long val;
register pa;
__asm__ __volatile__("mov %0, %1\n\t" : "=r" (pa) : "r"(paddr)); <-- move paddr to a register pa
__asm__ __volatile__("ldnp %0, [%1]\n\t" : "=r" (val) : "r" (pa)); <-- load data from addr in pa
return val;
}
main()
{
...
uint32_t SCP_WR_ADDR = &scp_wait; // where test1val was located. //x06000000;
uint32_t chk_scp_rd_data = 0;
// Send flag for proceeding SCP test
(*(volatile uint32_t *)(SCP_WR_ADDR)) = 0x87654321; <-- send signal to the other processor (scp)
// Receives flag from SCP
while(chk_scp_rd_data != 0x12345678) <--- read back until the value is changed (reverse order)
{
chk_scp_rd_data = mem_io_read(SCP_WR_ADDR);
}
}
When I compile this using gcc, I get this error
/tmp/ccCpQGc5.s: Assembler messages:
/tmp/ccCpQGc5.s:26: Error: operand 2 must be an integer register -- `ldnp x0,[x0]'
I can't figure out what is wrong here. Please help.
ADD : from Peter Cordes's comment, I changed it to this one. It is compiled ok.
static int inline mem_io_read(unsigned long paddr)
{
int val, val1;
__asm__ __volatile__("ldnp %0, %1, [%2]\n\t" : "=r" (val), "=r" (val1) : "r" (paddr) : "memory");
return val;
}

Addition function in Windows X86 via inline asm lines in C code

Can someone explain what I'm doing wrong here:
int MachineAdder(int a, int b)
{
int OUT = 0; /* Assign a pointer (&OUT) and write initial data (0) */
__asm ("mov %[dst], %[src]" /* Machine instruction to execute, separated by commas.*/
: [dst] "=r" (OUT)
: [src] "r" (a)
);
__asm ("add %[dst], %[src]" /* Machine instruction to execute, separated by commas.*/
: [dst] "=r" (OUT)
: [src] "r" (b)
);
return OUT; /* Return the value a+b */
}
In my main() function, I call:
printf("0+0 = %d\n", MachineAdder(0,0));
printf("0+1 = %d\n", MachineAdder(0,1));
printf("1+0 = %d\n", MachineAdder(1,0));
printf("1+1 = %d\n", MachineAdder(1,1));
printf("2+1 = %d\n", MachineAdder(2,1));
printf("1+2 = %d\n", MachineAdder(1,2));
In my output, it reads "0 2 0 2 2 4" (whereas I'd expect "0 1 1 2 3 3").
Thanks! Googling answers was messy because some machine instructions seem to be back to front, while others talk about registers but I don't know which register is which or how to use them.
EDIT: Working solution found. There were two errors: src and dst were the wrong way around, and I had never heard of the "+r" string, used for inout parameters. Here's the fixed version:
int MachineAdder(int a, int b)
{
int OUT = 0; /* Assign a pointer (&OUT) and write initial data (0) */
__asm ("mov %[src], %[dst]" /* Machine instruction to execute, separated by commas.*/
: [dst] "=r" (OUT)
: [src] "r" (a)
);
__asm ("add %[src], %[dst]" /* Machine instruction to execute, separated by commas.*/
: [dst] "+r" (OUT)
: [src] "r" (b)
);
return OUT; /* Return the value a+b */
}
Thanks all!
This is because, in the output operands, = mark doesn't guarantee that the location has the existing value while + mark does.
Extended Asm (Using the GNU Compiler Collection (GCC)) says:
Output constraints must begin with either ‘=’ (a variable overwriting an existing value) or ‘+’ (when reading and writing). When using ‘=’, do not assume the location contains the existing value on entry to the asm, except when the operand is tied to an input

Is there a race condition in the linux ARM spinlock?

Here is the Linux implementation of a spinlock from arch/arm/include/asm/spinlock.h:
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned long tmp;
u32 newval;
arch_spinlock_t lockval;
prefetchw(&lock->slock);
__asm__ __volatile__(
"1: ldrex %0, [%3]\n"
" add %1, %0, %4\n"
" strex %2, %1, [%3]\n"
" teq %2, #0\n"
" bne 1b"
: "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
: "cc");
while (lockval.tickets.next != lockval.tickets.owner) {
wfe();
lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
}
smp_mb();
}
...
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
smp_mb();
lock->tickets.owner++;
dsb_sev();
}
My concern is that the following two lines in arch_spin_lock:
while (lockval.tickets.next != lockval.tickets.owner) {
wfe();
are not atomic. So what if arch_spin_unlock was called in between these two lines? This means in the function arch_spin_lock the WFE instruction would be run but the SEV has already been run and won't be run again. So at the very worst arch_spin_lock would wait forever, or until some unrelated event occurs.
Is this correct, or am I misunderstanding something? If it is a problem even only in theory, is there a way to avoid the problem?
I think you are missing this bit of WFE documentation:
If the Event Register is set, WFE clears it and returns immediately.
In the "race" you describe WFE will get executed, but will return immediately, then while loop will exit.

Inline assembly: clarification of constraint modifiers

Two questions:
(1) If I understand ARM inline assembly correctly, a constraint of "r" says that the instruction operand can only be a core register and that by default is a read-only operand. However, I've noticed that if the same instruction has an output operand with the constraint "=r", the compiler may re-use the same register. This seems to violate the "read-only" attribute. So my question is: Does "read-only" refer to the register, or to the C variable that it is connected to?
(2) Is it correct to say that presence of "&" in the constraint of "=&r" simply requires that the register chosen for the output operand must not be the same as one of the input operand registers? My question relates to the code below used to compute the integer power function: i.e., are the "&" constraint modifiers necessary/appropriate?
asm (
" MOV %[power],1 \n\t"
"loop%=: \n\t"
" CBZ %[exp],done%= \n\t"
" LSRS %[exp],%[exp],1 \n\t"
" IT CS \n\t"
" MULCS %[power],%[power],%[base] \n\t"
" MUL %[base],%[base],%[base] \n\t"
" B loop%= \n\t"
"done%=: "
: [power] "+&r" (power)
[base] "+&r" (base)
[exp] "+&r" (exp)
:
: "cc"
) ;
Thanks!
Dan
Read-only refers to the use of the operand in assembly code. The assembly code can only read from the operand, and it must do so before any normal output operand (not an early clobber or a read/write operand) is written. This is because, as you've seen, the same register can be allocated to both an input and output operand. The assumption is that inputs are fully consumed before any output is written, which is normally the case for an assembly instruction.
I don't think using an early-clobber modifier & with an read/write modifier + has any effect since a register allocated to a read/write operand can't be used for anything else.
Here's how I'd write your code:
unsigned power = 1;
asm (
" CBZ %[exp],done%= \n\t"
"loop%=: \n\t"
" LSRS %[exp],%[exp],1 \n\t"
" IT CS \n\t"
" MULCS %[power],%[power],%[base] \n\t"
" MUL %[base],%[base],%[base] \n\t"
" BNE loop%= \n\t"
"done%=: "
: [power] "+r" (power),
[base] "+r" (base),
[exp] "+r" (exp)
:
: "cc"
) ;
Note the transformation of putting the loop test at the end of the loop, saving one instruction. Without it the code doesn't have any obvious improvement over what the compiler can generate. I also let the compiler do the initialization of the register used for the power operand. There's a small chance it will be able to allocate a register that already has the value 1 in it.
Thanks to all of you for the clarification. Just to be sure that I have it right, would it be correct to say that the choice between "=r" and "+r" for an output operand comes down to how the corresponding register is first used in the assembly template? I.e.,
"=r": The first use of the register is as a write-only output of an instruction.
The register may be re-used later by another instruction as an input or output. Adding an early clobber constraint (e.g., "=&r") prevents the compiler from assigning a register that was previously used as an input operand.
"+r": The first use of the register is as an input to an instruction, but the register is used again later as an output.
Best,
Dan

What's the meaning of the following code?

There is a CAS code below which can handle just int type,I know the function of CAS but I don't know the details shown below.
inline int CAS(unsigned long *mem,unsigned long newval,unsigned long oldval)
{
__typeof (*mem) ret;
__asm __volatile ("lock; cmpxchgl %2, %1"
: "=a" (ret), "=m" (*mem)
: "r" (newval), "m" (*mem), "0" (oldval));
return (int) ret;
}
I know there should be five parameters mapped to %0,%1,%2,%3,%4 because there are five parameters in input/output field
I also know that "=a" means using eax register,"=m" means using memory address,"r" means using any register
But I don't understand what the "0" means.
I don't understand why "cmpxchgl" only use two parameters %2, %1 instead of three?
It should use three params as the CAS function.
Where can I get all the infimation about the inline c asm?I need a complete tutorial.
%2 is newval, %1 is *mem
with "0" (oldval), and the first register occur is "=a", means that oldval is stored in eax.
So cmpxchgl %2, %1" means cmpxchgl newval, *mem"(while oldval in eax), which checks eax(value of oldval) whether equals *mem, if equal, change value of *mem to newval.

Resources