GCC - two identical functions but the code generated differs. Why? - c

The code:
#define OPPOSITE(c) (*((typeof(x) *)&(x)))
int foo(volatile int x)
{
OPPOSITE(x) = OPPOSITE(x) + OPPOSITE(x);
return x;
}
int bar(volatile int x)
{
OPPOSITE(x) = OPPOSITE(x) + OPPOSITE(x);
return x;
}
The result (-Os):
foo:
mov DWORD PTR [rsp-4], edi
mov eax, DWORD PTR [rsp-4]
mov edx, DWORD PTR [rsp-4]
add eax, edx
mov DWORD PTR [rsp-4], eax
mov eax, DWORD PTR [rsp-4]
ret
bar:
mov DWORD PTR [rsp-4], edi
mov eax, DWORD PTR [rsp-4]
add eax, eax
ret
or ARM gcc. (-O3)
foo:
sub sp, sp, #8
str r0, [sp, #4]
ldr r3, [sp, #4]
ldr r2, [sp, #4]
add r3, r3, r2
str r3, [sp, #4]
ldr r0, [sp, #4]
add sp, sp, #8
bx lr
bar:
sub sp, sp, #8
str r0, [sp, #4]
ldr r0, [sp, #4]
lsl r0, r0, #1
add sp, sp, #8
bx lr
https://godbolt.org/z/6z5Td9GsP

You can replace the code with
int foo(volatile int x)
{
x = x + x;
return x;
}
int bar(volatile int x)
{
x = x + x;
return x;
}
And have the same effect. No other C compiler generates this effect other than GCC, so I think it's reasonable to say this is some sort of compiler bug.

Related

How arguments are passed to the printf() function?

I am trying to understand the assembly code for a simple program, shown below.
void f()
{
int i, x = 0;
for (i = 0; i < 10; i++)
x++;
printf("Value of x: %d\n", x);
}
and its corresponding assembly code on my machine is
00000000000007d4 <f>:
7d4: a9be7bfd stp x29, x30, [sp, #-32]!
7d8: 910003fd mov x29, sp
7dc: b9001fff str wzr, [sp, #28]
7e0: b9001bff str wzr, [sp, #24]
7e4: 14000007 b 800 <f+0x2c>
7e8: b9401fe0 ldr w0, [sp, #28]
7ec: 11000400 add w0, w0, #0x1
7f0: b9001fe0 str w0, [sp, #28]
7f4: b9401be0 ldr w0, [sp, #24]
7f8: 11000400 add w0, w0, #0x1
7fc: b9001be0 str w0, [sp, #24]
800: b9401be0 ldr w0, [sp, #24]
804: 7100241f cmp w0, #0x9
808: 54ffff0d b.le 7e8 <f+0x14>
80c: b9401fe1 ldr w1, [sp, #28]
810: 90000000 adrp x0, 0 <__abi_tag-0x278>
814: 9121c000 add x0, x0, #0x870
818: 97ffff9a bl 680 <printf#plt>
81c: d503201f nop
820: a8c27bfd ldp x29, x30, [sp], #32
824: d65f03c0 ret
I understand the loop, but line 814 - 818 is really confusion to me. What's the purpose of adding #0x870 to x0? What does line 818 mean? And how arguments are passed to the printf() function?
I expect words like "Value of x: " appears in the assembly code, but it seems like the compiler simply knows what to print.

Why are w8,w9 used instead of w1,w2?

I am studying about armv8.
The following c language code
When converted to assembly with clang, w0 seems to be used for the return value, and w8 and w9 are used to save the variable values.
It is said that the arm has w series registers w0 to w30, but why are w8 and w9 used instead of w1 and w2?
int main() {
int a = 3;
int b = 5;
int c = a + b;
return c;
}
main: // #main
sub sp, sp, #16 // =16
str wzr, [sp, #12]
mov w8, #3
str w8, [sp, #8]
mov w8, #5
str w8, [sp, #4]
ldr w8, [sp, #8]
ldr w9, [sp, #4]
add w8, w8, w9
str w8, [sp]
ldr w0, [sp]
add sp, sp, #16 // =16
ret

Convert C to ARM Assembly program

I have this C program:
void Move1Disk(int fm, int to);
void Hanoi(int num, int fm, int to, int aux)
{
if (num > 1) Hanoi(num - 1, fm, aux, to) ;
Move1Disk(fm, to) ;
if (num > 1) Hanoi(num - 1, aux, to, fm) ;
}
I have written this but can not compile, can anyone please tell me the issue?
Hanoi(int, int, int, int):
cmp r0, #1
push {r4, r5, r6, r7, r8, lr}
mov r5, r1
mov r7, r2
movgt r4, r0
movgt r6, r3
ble .L9
.L3:
sub r4, r4, #1
mov r0, r4
mov r3, r7
mov r2, r6
mov r1, r5
bl Hanoi(int, int, int, int)
mov r1, r7
mov r0, r5
bl Move1Disk(int, int)
cmp r4, #1
beq .L2
mov r3, r5
mov r5, r6
mov r6, r3
b .L3
.L9:
mov r6, r1
.L2:
mov r1, r7
mov r0, r6
pop {r4, r5, r6, r7, r8, lr}
b Move1Disk(int, int)
here I have included the main main method
and a picture of the error message:
On the very first line:
Hanoi(int, int, int, int):
C functions don't have their argument types as part of their names. If you really are trying to duplicate a C program, this should just be Hanoi:. The same for all other instances of that and of Move1Disk.

Which of the functions is faster in terms of the execution time? beta() or alpha()?

Which would you expect to be faster? (Assume that arrays a[100] and b[100] are initialized globals)
void beta(){
int i;
for (i=0;i<100;i++){
a[i] = a[i] + b[i];
}
}
void alpha(){
int i=0;
while (i<100){
a[i] += b[i++];
a[i] += b[i++];
a[i] += b[i++];
a[i] += b[i++];
}
}
To avoid UB I rewrote the alpha function:
void alpha(){
int i=0;
while (i<100)
{
a[i] += b[i];
i++;
a[i] += b[i];
i++;
a[i] += b[i];
i++;
a[i] += b[i];
i++;
}
}
and the generated code depends on the platform:
For x86 it exactly the same.
beta:
xor eax, eax
.L2:
movdqa xmm0, XMMWORD PTR a[rax]
paddd xmm0, XMMWORD PTR b[rax]
add rax, 16
movaps XMMWORD PTR a[rax-16], xmm0
cmp rax, 400
jne .L2
ret
alpha:
xor eax, eax
.L6:
movdqa xmm0, XMMWORD PTR a[rax]
paddd xmm0, XMMWORD PTR b[rax]
add rax, 16
movaps XMMWORD PTR a[rax-16], xmm0
cmp rax, 400
jne .L6
ret
b:
.zero 400
a:
.zero 400
but if we consider ARM Cortex the alpha will execute faster.
beta:
ldr r3, .L6
ldr r1, .L6+4
add ip, r3, #400
.L2:
ldr r2, [r3, #4]!
ldr r0, [r1, #4]!
cmp r3, ip
add r2, r2, r0
str r2, [r3]
bne .L2
bx lr
.L6:
.word a-4
.word b-4
alpha:
ldr r3, .L13
ldr r2, .L13+4
push {r4, r5, r6, r7, r8, lr}
add r7, r3, #400
.L9:
ldr lr, [r3]
ldr ip, [r3, #4]
ldr r0, [r3, #8]
ldr r1, [r3, #12]
ldr r8, [r2]
ldr r6, [r2, #4]
ldr r5, [r2, #8]
ldr r4, [r2, #12]
add lr, lr, r8
add ip, ip, r6
add r0, r0, r5
add r1, r1, r4
str lr, [r3]
str ip, [r3, #4]
str r0, [r3, #8]
str r1, [r3, #12]
add r3, r3, #16
cmp r3, r7
add r2, r2, #16
bne .L9
pop {r4, r5, r6, r7, r8, pc}
.L13:
.word a
.word b
So the general answer is: always benchmark the code
https://godbolt.org/z/sWjqE1

Convert the C function into ARM assembly language

How exactly do I convert this C program into assembly code? I am having a hard time understanding the unsigned int manipulation.
unsigned int sum(unsigned int n){
if(n==0) return 0;
else return n+sum(n-1);
}
I have done this if I consider int.How to think for unsigned int?
sum:
SUB sp, sp, #8
STR lr, [sp,#4]
STR r0, [sp,#0]
CMP r0,#0
BGE L1
MOV r0, #0
ADD sp, sp, #8
MOV pc, lr
L1: SUB r0, r0, #1
BL sum
MOV r12, r0
LDR r0, [sp,#0]
LDR lr, [sp,#4]
ADD sp, sp, #8
ADD r0, r0, r12
MOV pc, lr
It won't matter for unsigned int, instructions as ADD and SUB behave correctly in both.
Some ISAs provide unsigned ADD and SUB (ADDU and SUBU) as MIPS, which only differ in overflow behavior.

Resources