64bit dividend on 32bit architecture, works in assembly but not in C - c

I have a 64bit dividend and a 32bit divisor.
GCC do not seem to be able to create this kind of assembly. It complains about undefined reference to '__udivdi3', I know this is because I use the -nostdlib flag. I can however not use any stdlibs.
The 64bit variables are of type unsigned long long.
Are there any more elegant way to do this other than this inline assembly?
My goals is: my64bit / 32bitDivisor.
volatile uint32_t high = my64bit >> 32;
volatile uint32_t low = my64bit;
volatile uint32_t out;
__asm__ __volatile__ (
"movl %0, %%edx\n\t"
"movl %1, %%eax\n\t"
"div %2\n\t"
"movl %%eax, (%3)\n\t"
:: "r" (high), "r" (low), "r" (32bitDivisor) "r" (&out)
: "%eax", "%edx"
);

Related

how to use rdtscp correctly?

according to 《How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set
Architectures》, i use code below:
static inline uint64_t bench_start(void)
{
unsigned cycles_low, cycles_high;
asm volatile("CPUID\n\t"
"RDTSCP\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t"
: "=r" (cycles_high), "=r" (cycles_low)
::"%rax", "%rbx", "%rcx", "%rdx");
return (uint64_t) cycles_high << 32 | cycles_low;
}
static inline uint64_t bench_end(void)
{
unsigned cycles_low, cycles_high;
asm volatile("RDTSCP\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t"
"CPUID\n\t"
: "=r" (cycles_high), "=r" (cycles_low)
::"%rax", "%rbx", "%rcx", "%rdx");
return (uint64_t) cycles_high << 32 | cycles_low;
}
but in fact, I also see someone use code below:
static inline uint64_t bench_start(void)
{
unsigned cycles_low, cycles_high;
asm_volatile("RDTSCP\n\t"
: "=d" (cycles_high), "=a" (cycles_low));
return (uint64_t) cycles_high << 32 | cycles_low;
}
static inline uint64_t bench_start(void)
{
unsigned cycles_low, cycles_high;
asm_volatile("RDTSCP\n\t"
: "=d" (cycles_high), "=a" (cycles_low));
return (uint64_t) cycles_high << 32 | cycles_low;
}
as you know, RDTSCP is pseudo serializing ,why someone use the second code?two reasons I guess, below:
Maybe in most situation, RDTSCP can ensure complete "in-order exectuion"?
Maybe just want to avoid using CPUID for efficient?

Inline assembly in C not working properly

I'm trying to learn how to use inline assembly in C code.
I have created a small program that should add two integers:
int main(){
int a=1;
int b=2;
asm( "movl %0, %%r8d;"
"movl %1, %%r9d;"
"addl %%r8d, %%r9d;"
"movl %%r9d, %1;"
: "=r" (a)
: "r" (b)
:"%r8","%r9" );
printf("a=%d\n",a);
return 0;
}
The aim was to load a and b into the registers %r8 and %r9, add them, and then put the output back in a.
However this program prints a=2 instead a=3. I'm not sure if the problem is in the inline technique or in the assembly itself.
There are two issues here:
First: The "=r" constraint you use for the output operand a indicates to the compiler that the operand is write-only — it is allowed to assume that the initial value is not needed. This is definitely not the case for your code! Change the qualifier to "+r" to let the compiler know that the initial value is important.
Second: You are moving the result to the wrong register! The target %1 of the last movl is the register corresponding to b, not a. You want %0.
Fixed:
asm(
"movl %0, %%r8d;"
"movl %1, %%r9d;"
"addl %%r8d, %%r9d;"
"movl %%r9d, %0;"
: "+r" (a)
: "r" (b)
: "%r8", "%r9"
);

gcc inline assembly error "operand type mismatch for mov"

//quick inline asm statements performing the swap_byte for key_scheduling
inline void swap_byte(unsigned char *x, unsigned char *y)
{
unsigned char t;
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(t)
:"r"(*x)
:"%eax");
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(*x)
:"r"(*y)
:"%eax");
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(*y)
:"r"(t)
:"%eax");
}
Here I am trying to swap the char from x and store in y, and the same for y to x.
I have compiled these instructions by changing movl to mov but with no success. Where is the problem in compiling/linking?
Here is the output from compiling in cygwin:
$ gcc rc4_main.c -o rc4ex
/tmp/ccy0wo6H.s: Assembler messages:
/tmp/ccy0wo6H.s:18: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:18: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:26: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:26: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:34: Error: operand type mismatch for `mov'
/tmp/ccy0wo6H.s:34: Error: operand type mismatch for `mov'
To simplify it even more (than user35443):
asm("" : "=r" (*x), "=r" (*y) : "1" (*x), "0" (*y));
Look ma! No code! And yes, this really works.
To explain how this works:
When the compiler is building the code, it keeps track of what value is in each register. So if had these for inputs to asm:
"r" (*x), "r" (*y)
The compiler will pick a register and put *x in it, then pick a register and put *y in it, then call your asm. But it also keeps track of what variable is in which register. If there were just some way to tell the compiler that all it had to do was start treating the two registers as the opposite variables, then we'd be set. And that's what this code does:
Saying "=r" (*x) means that we are going to be overwriting the value in *x, that that we will be putting the value into a register.
Saying "0" (*y) means that on input to the asm, the compiler must put the value of *y into the same register as is being used by output parameter #0.
So, without using any actually assembly instructions, we have told the compiler to swap these two values.
We don't get this quite "for free" since the compiler must load the values into registers before calling the asm. But since that has to happen anyway...
What about actually updating memory? The compiler will (if necessary) write these values from the registers back to memory. And since it knows what variable is in which register, all works as expected.
unsigned char t;
asm("movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(t) /* <--here */
:"r"(*x) /* <-- and here */
:"%eax");
You can not move a value from a 32-bit register to a single-byte memory location. t is on the stack and x is somewhere else, but both are accessed in the same way. Problems on the other lines are similar. You should move only a byte.
Try something like this, but there are more ways to do it (I haven't tried that, read below):
unsigned char t;
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(t)
:"r"(*x)
:"%al");
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(*x)
:"r"(*y)
:"%al");
asm("movb %1, %%al\n"
"movb %%al, %0\n"
:"=r"(*y)
:"r"(t)
:"%al");
The whole procedure can be simplified into this:
asm("movb (%1), %%al\n"
"movb (%2), %%ah\n"
"movb %%ah, (%1)\n"
"movb %%al, (%2)\n"
: /* no outputs for compiler to know about */
: "r" (x), "r" (y)
: "%ax", "memory");
line
movl %%eax, %0;
is full nonsence! So you try to change 0 constant by %eax register It's impossible. In fortran many years ago It was. After that all programs will behave quite unpredictable. Since to avoid that was introduce the rule that any identificatot can't begin with number. But You try do it. It's well to get error. You maybe mean another
movl %0, %%eax;
to set zerro to eax. So better do another code
xorl %%eax, %%eax;
is much better!

Compile GCC Inline Assembly into Microsoft Visual C++ 2008

I'm having trouble compiling this GCC inline assembly to Microsoft Visual C++ 2008 assembly
GCC inline assembly:
__asm__(
"smull %0, %1, %2, %3 \n\t"
"mov %0, %0, LSR #16 \n\t"
"add %1, %0, %1, LSL #16 \n\t"
: "=&r"(lo), "=&r"(hi)
: "r"(rb), "r"(ra));
The compiler says:
error C2143: syntax error : missing ')' before ':'
The complete function is:
static __inline Word32 mull(Word32 a, Word16 b)
{
register Word32 ra = a;
register Word32 rb = b;
Word32 lo, hi;
__asm__(
"smull %0, %1, %2, %3 \n\t"
"mov %0, %0, LSR #16 \n\t"
"add %1, %0, %1, LSL #16 \n\t"
: "=&r"(lo), "=&r"(hi)
: "r"(rb), "r"(ra));
return hi;
}
Thanks.
Visual Studio does not support ARM inline assembly. See: Inline assembly is not supported on the ARM. You will need to either reverse-engineer the assembly code to C, or use a separate assembler and link this as a separate function.
It looks like this function just does a 32 x 32 -> 64 bit signed multiply and then shifts the 64 bit result right by 16 bits and truncates it to 32 bits:
static __inline Word32 mull(Word32 a, Word16 b)
{
return (Word32)(((Word64)a * (Word64)b) >> 16);
}

SSE2 instruction in C code

I am trying to reverse engineer a c code, but this part of assembly I cant really understand. I know it is part of the SSE extension. However, somethings are really different than what I am used to in x86 instructions.
static int sad16_sse2(void *v, uint8_t *blk2, uint8_t *blk1, int stride, int h)
{
int ret;
__asm__ volatile(
"pxor %%xmm6, %%xmm6 \n\t"
ASMALIGN(4)
"1: \n\t"
"movdqu (%1), %%xmm0 \n\t"
"movdqu (%1, %3), %%xmm1 \n\t"
"psadbw (%2), %%xmm0 \n\t"
"psadbw (%2, %3), %%xmm1 \n\t"
"paddw %%xmm0, %%xmm6 \n\t"
"paddw %%xmm1, %%xmm6 \n\t"
"lea (%1,%3,2), %1 \n\t"
"lea (%2,%3,2), %2 \n\t"
"sub $2, %0 \n\t"
" jg 1b \n\t"
: "+r" (h), "+r" (blk1), "+r" (blk2)
: "r" ((x86_reg)stride)
);
__asm__ volatile(
"movhlps %%xmm6, %%xmm0 \n\t"
"paddw %%xmm0, %%xmm6 \n\t"
"movd %%xmm6, %0 \n\t"
: "=r"(ret)
);
return ret;
}
What are the %1, %2, and %3? what does (%1,%2,%3) mean? Also what does "+r", "-r", "=r" mean?
You'll want to have a look at this GCC Inline Asssembly HOWTO.
The percent sign numbers are the instruction operands.
The inline assembler works similar to a macro preprocessor. Operands with exactly one leading percent are replaced by the the input parameters in the order as they appear in the parameter list, in this case:
%0 h output, register, r/w
%1 blk1 output, register, r/w
%2 blk2 output, register, r/w
%3 (x86_reg)stride input, register, read only
The parameters are normal C expressions. They can be further specified by "constraints", in this case "r" means the value should be in a register, opposed to "m" which is a memory operand. The constraint modifier "=r" makes this a write-only operand, "+r" is a read-write operand and "r" and normal read operand.
After the first colon the output operands appear, after the second the input operands and after the optional third the clobbered registers.
So the instruction sequence calculates the sum of the absolute differences in each byte of blk1 and blk2. This happens in 16 byte blocks, so if stride is 16, the blocks are contiguous, otherwise there are holes. Each instruction appears twice because some minimal loop unrolling is done, the h parameter is the number of 32 byte blocks to process. The second asm block seems to be useless, as the psadbw instruction sums up only in the low 16 bit of the destination register. (Did you omit some code?)

Resources