Is gcc doing a recursive call? - c

I'm compiling this code with -O3 -x c -std=c99 -fno-builtin -nostdlib -ffreestanding
unsigned char *memset(unsigned char *dest, unsigned char val, int count)
{
unsigned char* p = dest;
while (count--)
*p++ = val;
return dest;
}
#include <stdio.h>
int main()
{
unsigned char c[20];
memset(c, 'a', 19);
c[19] = '\0';
printf((const char*) c);
}
and using godbolt to examine what memset gcc is calling in the assembly output.
memset:
test edx, edx
je .L6
sub edx, 1
sub rsp, 8
movzx esi, sil
add rdx, 1
call memset
add rsp, 8
ret
.L6:
mov rax, rdi
ret
main:
sub rsp, 40
movabs rax, 7016996765293437281
mov QWORD PTR [rsp], rax
mov QWORD PTR [rsp+8], rax
mov eax, 24929
mov WORD PTR [rsp+16], ax
mov rdi, rsp
xor eax, eax
mov BYTE PTR [rsp+18], 97
mov BYTE PTR [rsp+19], 0
call printf
add rsp, 40
ret
With the flags I used I'm attempting to eliminate all possibility of it calling a built-in memset and judging from the colorization godbolt uses, it looks like gcc is doing a recursive call at *p++ = val;. So is it doing recursion or calling builtin memset?

As others have indicated, the setting of the array c elements has been inlined. As a result, the memset() you implemented is not even getting called. This is a result of the use of the -03 compiler option. The compiler is being very aggressive in its optimizations. Furthermore, there is no recursion on the execution path.
However, that does not entirely answer your question. The memset() shown in the disassembled output is indeed NOT the built in version and it is not even being executed.
Incidentally, you do not need to apply the -fno-builtin flag as the -ffreestanding flag automatically implies it. Also, if you enable garbage collection, I am sure that will find that the memset() routine in the disassembled output will vanish.

Related

Which actions are taken in compile time?

I cannot get the list of things that are optimized at compile time, and what are the compiler optimizations that do the reduction if all the necessary info is given. However, the question is tagged C, but the thing that made me ask it is C++'s constexpr - I naively thought that the things the keyword allows were already available "out of box".
EDIT: the basic example is
#include <stdio.h>
int main(int argc, char *argv[]){
int a = 10;
int b = 8;
printf("Answer is %d", a-b);
}
If compiled with -O0 (x86-64 GCC9.2) we get real load and subtraction when the program is run
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 10
mov DWORD PTR [rbp-8], 8
mov eax, DWORD PTR [rbp-4]
sub eax, DWORD PTR [rbp-8]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
When -O3 (x86-64 GCC9.2) is used, subtraction is optimized out and we load only 2
mov esi, 2
mov edi, OFFSET FLAT:.LC0
xor eax, eax
jmp printf
We probably can replace some function calls with constant values or simplify functions if values are available in code. What is the optimization name that does the thing?
EDIT2 The question is mainly about GCC or Clang for x86-64 platform. I understand that a compiler must not optimize any code, but often programmers use the compilers with options other than -O0. As noted by KamilCuk, the code above can be reduced to puts("Answer is 2") but it is not, and I don't know why. I'd highly appreciate if someone provides an [or a link to] overview of optimizations that GCC or Clang do at different levels of optimization.

Static keyword in an array function parameter declaration

Here is the explanation of what it means 6.7.6.3/7:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of the
corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by the
size expression.
It is not quite clear what it means. I ran the following example:
main.c
#include "func.h"
int main(void){
char test[4] = "123";
printf("%c\n", test_func(2, test));
}
And 2 different implementations of the test_func:
static version
func.h
char test_func(size_t idx, const char[const static 4]);
func.c
char test_func(size_t idx, const char arr[const static 4]){
return arr[idx];
}
non-static version
func.h
char test_func(size_t idx, const char[const 4]);
func.c
char test_func(size_t idx, const char arr[const 4]){
return arr[idx];
}
I checked the assembly code compiled with gcc 7.4.0 -O3 of the function in both of the cases and it turned out to be completely identical:
Disassembly of the functions
(gdb) disas main
sub rsp,0x18
mov edi,0x2
lea rsi,[rsp+0x4]
mov DWORD PTR [rsp+0x4],0x333231
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rsp+0x8],rax
xor eax,eax
call 0x740 <test_func>
[...]
(gdb) disas test_func
movzx eax,BYTE PTR [rsi+rdi*1]
ret
Can you give an example where the static keyword gives some benefits (or any differences at all) comparing to non-static counterpart?
Here is an example where static actually makes a difference:
unsigned foo(unsigned a[2])
{
return a[0] ? a[0] * a[1] : 0;
}
clang (for x86-64, with -O3) compiles this to
foo:
mov eax, dword ptr [rdi]
test eax, eax
je .LBB0_1
imul eax, dword ptr [rdi + 4]
ret
.LBB0_1:
xor eax, eax
ret
But after replacing the function parameter with unsigned a[static 2], the result is simply
foo:
mov eax, dword ptr [rdi + 4]
imul eax, dword ptr [rdi]
ret
The conditional branch is not necessary because a[0] * a[1] evaluates to the correct result whether a[0] is zero or not. But without the static keyword, the compiler cannot assume that a[1] can be accessed, and thus has to check a[0].
Currently only clang does this optimization; ICC and gcc produce the same code in both cases.
This isn't used much by compilers in my experience, but one use is that the compiler can assume that the (array decayed into pointer) parameter is not NULL.
Given this function, both gcc and clang (x86) produce identical machine code at -O3:
int func (int a[2])
{
if(a)
return 1;
return 0;
}
Disassembly:
func:
xor eax, eax
test rdi, rdi
setne al
ret
When changing the parameter to int a[static 2], gcc gives the same output as before, but clang does a better job:
func:
mov eax, 1
ret
Since clang realizes that a can never be NULL, so it can skip the check.

Is there any way to save registers before jumping into function?

this is my first question, because I couldn't find anything related to this topic.
Recently, while making a class for my C game engine project I've found something interesting:
struct Stack *S1 = new(Stack);
struct Stack *S2 = new(Stack);
S1->bPush(S1, 1, 2); //at this point
bPush is a function pointer in the structure.
So I wondered, what does operator -> in that case, and I've discovered:
mov r8b,2 ; a char, written to a low point of register r8
mov dl,1 ; also a char, but to d this time
mov rcx,qword ptr [S1] ; this is the 1st parameter of function
mov rax,qword ptr [S1] ; !Why cannot I use this one?
call qword ptr [rax+1A0h] ; pointer call
so I assume -> writes an object pointer to rcx, and I'd like to use it in functions (methods they shall be). So the question is, how can I do something alike
push rcx
// do other call vars
pop rcx
mov qword ptr [this], rcx
before it starts writing other variables of the function. Something with preprocessor?
It looks like you'd have an easier time (and get asm that's the same or more efficient) if you wrote in C++ so you could use language built-in support for virtual functions, and for running constructors on initialization. Not to mention not having to manually run destructors. You wouldn't need your struct Class hack.
I'd like to implicitly pass *this pointer, because as shown in second asm part it does the same thing twice, yes, it is what I'm looking for, bPush is a part of a struct and it cannot be called from outside, but I have to pass the pointer S1, which it already has.
You get inefficient asm because you disabled optimization.
MSVC -O2 or -Ox doesn't reload the static pointer twice. It does waste a mov instruction copying between registers, but if you want better asm use a better compiler (like gcc or clang).
The oldest MSVC on the Godbolt compiler explorer is CL19.0 from MSVC 2015, which compiles this source
struct Stack {
int stuff[4];
void (*bPush)(struct Stack*, unsigned char value, unsigned char length);
};
struct Stack *const S1 = new(Stack);
int foo(){
S1->bPush(S1, 1, 2);
//S1->bPush(S1, 1, 2);
return 0; // prevent tailcall optimization
}
into this asm (Godbolt)
# MSVC 2015 -O2
int foo(void) PROC ; foo, COMDAT
$LN4:
sub rsp, 40 ; 00000028H
mov rax, QWORD PTR Stack * __ptr64 __ptr64 S1
mov r8b, 2
mov dl, 1
mov rcx, rax ;; copy RAX to the arg-passing register
call QWORD PTR [rax+16]
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
int foo(void) ENDP ; foo
(I compiled in C++ mode so I could write S1 = new(Stack) without having to copy your github code, and write it at global scope with a non-constant initializer.)
Clang7.0 -O3 loads into RCX straight away:
# clang -O3
foo():
sub rsp, 40
mov rcx, qword ptr [rip + S1]
mov dl, 1
mov r8b, 2
call qword ptr [rcx + 16] # uses the arg-passing register
xor eax, eax
add rsp, 40
ret
Strangely, clang only decides to use low-byte registers when targeting the Windows ABI with __attribute__((ms_abi)). It uses mov esi, 1 to avoid false dependencies when targeting its default Linux calling convention, not mov sil, 1.
Or if you are using optimization, then it's because even older MSVC is even worse. In that case you probably can't do anything in the C source to fix it, although you might try using a struct Stack *p = S1 local variable to hand-hold the compiler into loading it into a register once and reusing it from there.)

Conditional move instruction on a NULL check before dereferencing a pointer [duplicate]

This question already has answers here:
Why does this example of what compilers aren't allowed to do cause null pointer dereferencing using cmov?
(3 answers)
Hard to debug SEGV due to skipped cmov from out-of-bounds memory
(2 answers)
Closed 4 years ago.
I'm working on exercise 3.61 of CSAPP, which requires to write a very simple function that checks if a pointer is NULL before trying to dereference it, which should base on a conditional move instruction rather than a jump. Here's an example I've found online:
long cond(long* p) {
return (!p) ? 0 : *p;
}
According to claims, the function can be compiled into the following assembly:
cond:
xor eax, eax
test rdi, rdi
cmovne rax, QWORD PTR [rdi]
ret
I am running GCC 7.3.0 (from APT package gcc/bionic-updates,now 4:7.3.0-3ubuntu2.1 amd64) on Ubuntu 18.04 on WSL. The computer is running on an Intel Coffee Lake (i.e. 8th-gen Core-i) processor.
I have tried the following commands:
gcc -S a.c -O3
gcc -S a.c -O3 -march=x86-64
gcc -S a.c -O3 -march=core2
gcc -S a.c -O3 -march=k8
Honestly, I wasn't able to observe any difference in the generated a.s file, since all of them look like
cond:
xorl %eax, %eax
testq %rdi, %rdi
je .L1
movq (%rdi), %rax
.L1:
ret
Is there any possibility to have such a function that compiles into a conditional move, without a jump?
Edit: As told in comments, the CMOVxx series of instructions loads operands unconditionally, and only the actual assignment operation is conditional, so there'd be no luck to put *p (or (%rdi)) as the source operand of CMOV, is it right?
The claim is on this page but I think it's invalid.
Here is a branch-less version:
inline long* select(long* p, long* q) {
uintptr_t a = (uintptr_t)p;
uintptr_t b = (uintptr_t)q;
uintptr_t c = !a;
uintptr_t r = (a & (c - 1)) | (b & (!c - 1));
return (long*)r;
}
long cond(long* p) {
long t = 0;
return *select(p, &t);
}
Assembly for gcc-8.2:
cond(long*):
mov QWORD PTR [rsp-8], 0
xor eax, eax
test rdi, rdi
sete al
lea rdx, [rax-1]
neg rax
and rdi, rdx
lea rdx, [rsp-8]
and rax, rdx
or rdi, rax
mov rax, QWORD PTR [rdi]
ret
Assembly for clang-7:
cond(long*): # #cond(long*)
mov qword ptr [rsp - 8], 0
xor eax, eax
test rdi, rdi
lea rcx, [rsp - 8]
cmovne rcx, rax
or rcx, rdi
mov rax, qword ptr [rcx]
ret

How do I best use the const keyword in C?

I am trying to get a sense of how I should use const in C code. First I didn't really bother using it, but then I saw a quite a few examples of const being used throughout. Should I make an effort and go back and religiously make suitable variables const? Or will I just be waisting my time?
I suppose it makes it easier to read which variables that are expected to change, especially in function calls, both for humans and the compiler. Am I missing any other important points?
const is typed, #define macros are not.
const is scoped by C block, #define applies to a file (or more strictly, a compilation unit).
const is most useful with parameter passing. If you see const used on a prototype with pointers, you know it is safe to pass your array or struct because the function will not alter it. No const and it can.
Look at the definition for such as strcpy() and you will see what I mean. Apply "const-ness" to function prototypes at the outset. Retro-fitting const is not so much difficult as "a lot of work" (but OK if you get paid by the hour).
Also consider:
const char *s = "Hello World";
char *s = "Hello World";
which is correct, and why?
How do I best use the const keyword in C?
Use const when you want to make it "read-only". It's that simple :)
Using const is not only a good practice but improves the readability and comprehensibility of the code as well as helps prevent some common errors. Definitely do use const where appropriate.
Apart from producing a compiler error when attempting to modify the constant and passing the constant as a non-const parameter, therefore acting as a compiler guard, it also enables the compiler to perform certain optimisations knowing that the value will not change and therefore it can cache the value and not have to read it fresh from memory, because it won't have changed, and it allows it to be immediately substituted in the code.
C const
const and register are basically the opposite of volatile and using volatile will override the const optimisations at file and block scope and the register optimisations at block-scope. const register and register will produce identical outputs because const does nothing on C at block-scope on gcc C -O0, and is redundant on -O1 and onwards, so only the register optimisations apply at -O0, and are redundant from -O1 onwards.
#include<stdio.h>
int main() {
const int i = 1;
printf("%d", i);
}
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4] //load from stack isn't eliminated for block-scope consts on gcc C unlike on gcc C++ and clang C, even though value will be the same
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
In this instance, with -O0, const, volatile and auto all produce the same code, with only register differing c.f.
#include<stdio.h>
const int i = 1;
int main() {
printf("%d", i);
}
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
with const int i = 1; instead:
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, 1 //saves load from memory, now immediate
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
C++ const
#include <iostream>
int main() {
int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores on stack
mov eax, DWORD PTR [rbp-4] //loads the value stored on the stack
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int main() {
const int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores it on the stack
mov esi, 1 //but saves a load from memory here, unlike on C
//'register' would skip this store on the stack altogether
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int i = 1;
int main() {
std::cout << i;
}
i:
.long 1
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
#include <iostream>
const int i = 1;
int main() {
std::cout << i;
}
main:
push rbp
mov rbp, rsp
mov esi, 1 //eliminated load from memory, now immediate
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
C++ has the extra restriction of producing a compiler error if a const is not initialised (both at file-scope and block-scope). const also has internal linkage as a default on C++. volatile still overrides const and register but const register combines both optimisations on C++.
Even though all the above code is compiled using the default implicit -O0, when compiled with -Ofast, const surprisingly still isn't redundant on C or C++ on clang or gcc for file-scoped consts. The load from memory isn't optimised out unless const is used, even if the file-scope variable isn't modified in the code. https://godbolt.org/z/PhDdxk.

Resources