This question already has answers here:
Why does Clang generate different code for reference and non-null pointer arguments?
(2 answers)
Why is gcc allowed to speculatively load from a struct?
(6 answers)
Boolean values as 8 bit in compilers. Are operations on them inefficient?
(3 answers)
Closed 6 months ago.
Let's assume there's the following code:
#include <stdbool.h>
typedef struct {
bool a;
bool b;
} MyStruct2;
bool g(MyStruct2 *s) {
return s->a || s->b;
}
bool g2(MyStruct2 *s) {
return s->a | s->b;
}
int main() {
return 0;
}
Which compiles into this:
g:
movzx eax, BYTE PTR [rdi]
test al, al
jne .L1
movzx eax, BYTE PTR [rdi+1]
.L1:
ret
g2:
movzx eax, BYTE PTR [rdi]
or al, BYTE PTR [rdi+1]
ret
main:
xor eax, eax
ret
g2 seems to be shorter and it does not include any jump. So why does gcc not optimize g to the same code as g2? None of the members of MyStruct2 is volatile (or otherwise special), so it should be safe to evaluate s->b in g in all cases (even if s->a is true and it would not be required to evaluate s->b).
Why doesnt gcc produce the shorter code without a jump?
Thanks
Related
This question already has answers here:
Compiler generates costly MOVZX instruction
(2 answers)
Why doesn't GCC use partial registers?
(3 answers)
Closed 1 year ago.
I have following snippet:
typedef struct {
unsigned int gender: 1;
unsigned int age: 7;
unsigned int uid: 24;
} person;
int
f(person *p) {
return p->gender;
}
which generates (gcc -O2):
f:
movzx eax, BYTE PTR [rdi] ; this line looks interesting.
and eax, 1
ret
Questions
If I understand correctly MOVZX adds 0s at the beginning. As you can see in the next line it ANDs the value with 1. The question is: does MOVZX provide any value here? I mean wouldn't it work with a normal MOV? I need only the first bit there right?
I sadly couldn't find a "speed comparison" between MOV vs MOVZX. Can someone tell me does one perform better than the other (I'm assuming that MOVZX is slower, because it has to go and "clear out" the bits. If I'm not wrong)?
This question already has answers here:
How does argument passing and returning values work in C/C++ on x86 at the assembly level? [closed]
(1 answer)
Passing Arguments C -> NASM -> C
(2 answers)
Passing various parameters from C to Assembler
(2 answers)
Closed 1 year ago.
Hi I have already written a program in Assembly 64 bit mode in connection with C, that counts the number of left and right brackets check here:
bits 64
section .data
extern g_left, g_right, g_str
section .text
global count
count:
enter 0,0
mov eax, 0
mov ebx, 0
mov ecx, 0
.back:
cmp [g_str + eax], byte 0
je .out
cmp [g_str + eax], byte '['
jne .right
inc ebx
.right:
cmp [g_str + eax], byte ']'
jne .skip
inc ecx
.skip:
inc eax
jmp .back
.out:
mov [g_l], ebx
mov [g_r], ecx
leave
ret
C code:
#include <stdio.h>
void count();
char g_str[] = "[[[]]]][[32423]][234dsfsdf";
int g_left, g_right;
int main()
{
count();
printf("left = %d and right = %d\n", g_left, g_right);
}
What I want is to use this assembly code but change it a bit so that a function that is called in C with a string as input and just prints the number of brackets. Also, I want it in 32-bit mode this time. It should look like this:
int brackets( char *t_str );
I'm new to assembly and confused on how to change my code, please help me.
This question already has answers here:
Why can't I move directly a byte to a 64 bit register?
(2 answers)
How do AX, AH, AL map onto EAX?
(6 answers)
Subtracting two characters
(1 answer)
Closed 2 years ago.
I'm trying to re-write the function strcmp in assembly.
I wrote this code:
global _ft_strcmp
section .text
_ft_strcmp:
mov rax, 0
mov rdx, -1
_loop:
inc rdx
mov dh, BYTE[rsi + rdx]
mov al, BYTE[rdi + rdx]
cmp dh, al
je _loop
jmp _exit
_exit:
sub dh, al
mov rax, dh
ret
main.c:
#include <stdio.h>
#include <string.h>
int ft_strcmp(char *s1, char *s2);
int main()
{
printf("|-%d-||-%d-|\n",ft_strcmp("mehdi", "Mehdi"), strcmp("mehdi", "Mehdi"));
return (0);
}
when I compile the file ft_strcmp.s I get this error:
test nasm -f macho64 ft_strcmp.s
ft_strcmp.s:18: error: invalid combination of opcode and operands
EDIT: I'm using Intel x86_64 syntax :)
Here is the explanation of what it means 6.7.6.3/7:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of the
corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by the
size expression.
It is not quite clear what it means. I ran the following example:
main.c
#include "func.h"
int main(void){
char test[4] = "123";
printf("%c\n", test_func(2, test));
}
And 2 different implementations of the test_func:
static version
func.h
char test_func(size_t idx, const char[const static 4]);
func.c
char test_func(size_t idx, const char arr[const static 4]){
return arr[idx];
}
non-static version
func.h
char test_func(size_t idx, const char[const 4]);
func.c
char test_func(size_t idx, const char arr[const 4]){
return arr[idx];
}
I checked the assembly code compiled with gcc 7.4.0 -O3 of the function in both of the cases and it turned out to be completely identical:
Disassembly of the functions
(gdb) disas main
sub rsp,0x18
mov edi,0x2
lea rsi,[rsp+0x4]
mov DWORD PTR [rsp+0x4],0x333231
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rsp+0x8],rax
xor eax,eax
call 0x740 <test_func>
[...]
(gdb) disas test_func
movzx eax,BYTE PTR [rsi+rdi*1]
ret
Can you give an example where the static keyword gives some benefits (or any differences at all) comparing to non-static counterpart?
Here is an example where static actually makes a difference:
unsigned foo(unsigned a[2])
{
return a[0] ? a[0] * a[1] : 0;
}
clang (for x86-64, with -O3) compiles this to
foo:
mov eax, dword ptr [rdi]
test eax, eax
je .LBB0_1
imul eax, dword ptr [rdi + 4]
ret
.LBB0_1:
xor eax, eax
ret
But after replacing the function parameter with unsigned a[static 2], the result is simply
foo:
mov eax, dword ptr [rdi + 4]
imul eax, dword ptr [rdi]
ret
The conditional branch is not necessary because a[0] * a[1] evaluates to the correct result whether a[0] is zero or not. But without the static keyword, the compiler cannot assume that a[1] can be accessed, and thus has to check a[0].
Currently only clang does this optimization; ICC and gcc produce the same code in both cases.
This isn't used much by compilers in my experience, but one use is that the compiler can assume that the (array decayed into pointer) parameter is not NULL.
Given this function, both gcc and clang (x86) produce identical machine code at -O3:
int func (int a[2])
{
if(a)
return 1;
return 0;
}
Disassembly:
func:
xor eax, eax
test rdi, rdi
setne al
ret
When changing the parameter to int a[static 2], gcc gives the same output as before, but clang does a better job:
func:
mov eax, 1
ret
Since clang realizes that a can never be NULL, so it can skip the check.
I'm compiling this code with -O3 -x c -std=c99 -fno-builtin -nostdlib -ffreestanding
unsigned char *memset(unsigned char *dest, unsigned char val, int count)
{
unsigned char* p = dest;
while (count--)
*p++ = val;
return dest;
}
#include <stdio.h>
int main()
{
unsigned char c[20];
memset(c, 'a', 19);
c[19] = '\0';
printf((const char*) c);
}
and using godbolt to examine what memset gcc is calling in the assembly output.
memset:
test edx, edx
je .L6
sub edx, 1
sub rsp, 8
movzx esi, sil
add rdx, 1
call memset
add rsp, 8
ret
.L6:
mov rax, rdi
ret
main:
sub rsp, 40
movabs rax, 7016996765293437281
mov QWORD PTR [rsp], rax
mov QWORD PTR [rsp+8], rax
mov eax, 24929
mov WORD PTR [rsp+16], ax
mov rdi, rsp
xor eax, eax
mov BYTE PTR [rsp+18], 97
mov BYTE PTR [rsp+19], 0
call printf
add rsp, 40
ret
With the flags I used I'm attempting to eliminate all possibility of it calling a built-in memset and judging from the colorization godbolt uses, it looks like gcc is doing a recursive call at *p++ = val;. So is it doing recursion or calling builtin memset?
As others have indicated, the setting of the array c elements has been inlined. As a result, the memset() you implemented is not even getting called. This is a result of the use of the -03 compiler option. The compiler is being very aggressive in its optimizations. Furthermore, there is no recursion on the execution path.
However, that does not entirely answer your question. The memset() shown in the disassembled output is indeed NOT the built in version and it is not even being executed.
Incidentally, you do not need to apply the -fno-builtin flag as the -ffreestanding flag automatically implies it. Also, if you enable garbage collection, I am sure that will find that the memset() routine in the disassembled output will vanish.