Here are some simple tests run on a x86_64 to show assembler code generated when using inline statement :
TEST 1
static inline void
show_text(void)
{
printf("Hello\n");
}
int main(int argc, char *argv[])
{
show_text();
return 0;
}
And assembler :
gcc -O0 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.type show_text, #function
show_text:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
ret
.size show_text, .-show_text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call show_text
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 1 result : inline suggestion not taken into account by compiler
Test 2
Same code as test 1, but with -O1 optimization flag
gcc -O1 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Hello"
.text
.globl main
.type main, #function
main:
subq $8, %rsp
leaq .LC0(%rip), %rdi
call puts#PLT
movl $0, %eax
addq $8, %rsp
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 2 result : no more show_text function defined in assembler
Test 3
show_text not declared as inline, -O1 optimization flag
Test 3 result : no more show_text function defined in assembler, with or without inline : same generated code
Test 4
#include <stdio.h>
static inline void
show_text(void)
{
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
}
int main(int argc, char *argv[])
{
show_text();
show_text();
return 0;
}
produces :
gcc -O1 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.type show_text, #function
show_text:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
ret
.size show_text, .-show_text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call show_text
call show_text
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 4 result : show_text defined in assembler, inline suggestion not taken into account
I understand inline keyword does not force inlining. But for Test 1 results, what can prevent show_text code replacement in main?
So far, I used to inline some small static functions in my C source code. But from these results it seems quite useless.
Why should I declare some of my small functions static inline when using some modern compilers (and possibly compiling optimized code)?
It is one of those questionable decisions of the C Language Standards people... use of inline does not guarantee a function to be inlined... the keyword only suggests to the compiler that the function could be inlined.
I've had lengthy exchanges on this topic with the ISO WG; this followed a MISRA guideline that requires all inline functions to be declared at module scope using the static keyword. Their logic is that there may be circumstances where the compiler needs to not inline the function... and equally, there may be cases where that non-inlined function needs to have global scope!
IMHO, if a programmer adds the inline keyword, then the suggestion is that they know what they are doing, and that function should be inline.
As you suggest, in its current form, the inline keyword is effectively pointless, unless a compiler treats it seriously.
In your first test you disable optimizations. Inlining is an optimization method. Do not expect it to happen.
Also inline keyword doesn't work nowadays as it used to in the past. I'd say it's only purpose is to have functions in headers without having linker errors about duplicated symbols (when more than one cpp file uses such a header).
Let your compiler do its work. Just enable optimizations (including LTO) and do not worry about details.
Related
I have compiled a program main.c with about two lines of code to see what directives gcc / gas add to the unoptimized assembly file, using:
gcc -o main.s main.c -S
I can look up the concise description of each directive on the gas directive page, but was hoping someone could give a bit more context to some of these directives and what its practical usage is (for example, in gdb or the linker or wherever downstream). Here is the full assembly file with the items in question below:
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $4, -8(%rbp)
movl $6, -4(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
.file: it seems this is halfway-obsolete based on This statement may go away in future: it is only recognized to be compatible with old as programs.. But given that it is still there, where or how is this currently being used?
.ident: it seems like this gives the same thing as doing gcc --version. Is this used at all beyond giving helper information on the 'gcc' that was used to issue the command, or how is this used?
.section .note...: I have seen .section .text, .section .bss, .section .text, ...but I've never come across a .note, and doing a ctrl-f to search for note doesn't give anything on this page. What is this line doing with the three arguments? And the #progbits ?
.size: given that the directives take up no space, this is giving us the length of the first statement within main -- pushq %rbp minus the last statement ret, which is the length of the main function. But again, what usage is this? Also, it says on the as page that It is only permitted inside .def/.endef pairs., but this isn't inside those pairs, right?
.section .text.startup,"ax",#progbits -- what is text.startup, the ax looks like it means allocatable+executable, but what or where is the text.startup ?
I have the following C program
int main() {
char string[] = "Hello, world.\r\n";
__asm__ volatile ("syscall;" :: "a" (1), "D" (0), "S" ((unsigned long) string), "d" (sizeof(string) - 1)); }
which I want to run under Linux with with x86 64 bit. I call the syscall for "write" with 0 as fd argument because this is stdout.
If I compile under gcc with -O3, it does not work. A look into the assembly code
.file "test_for_o3.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
subq $40, %rsp
.cfi_def_cfa_offset 48
xorl %edi, %edi
movl $15, %edx
movq %fs:40, %rax
movq %rax, 24(%rsp)
xorl %eax, %eax
movq %rsp, %rsi
movl $1, %eax
#APP
# 5 "test_for_o3.c" 1
syscall;
# 0 "" 2
#NO_APP
movq 24(%rsp), %rcx
xorq %fs:40, %rcx
jne .L5
xorl %eax, %eax
addq $40, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
call __stack_chk_fail#PLT
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0"
.section .note.GNU-stack,"",#progbits
tells us that gcc has simply not put the string data into the assembly code. Instead, if I declare "string" as "volatile", it works fine.
However, the idea of "volatile" is just to use it for variables that can change their values by (from the view of the executing function) unexpected events, isn't it? "volatile" can make code much slower, hence it should be avoided if possible.
As I would suppose, gcc must assume that the content of "string" must not be ignored because the pointer "string" is used as an input parameter in the inline assembly (and gcc has no idea what the inline assembly code will do with it).
If this is "allowed" behaviour of gcc, where can I read more about all the formal constraints I have to be aware of when writing code for -O3?
A second question would be what the "volatile" statement along with the inline assembly directive does exactly. I just got used to mark all inline assembly directives with "volatile" because it had not worked otherwise, in some situations.
I have a program goo.c
void foo(double);
#include <stdio.h>
void foo(int x){
printf ("in foo.c:: x= %d\n",x);
}
which is called by foo.c
int main(){
double x=3.0;
foo(x);
}
I compile and run
gcc foo.c goo.c
./a.out
Guess what? I get "x= 1" as result. Then I find the signature of 'foo' should have been void foo(int). Apparently, my double input value 3.0 has to be downcast to an int. But, if I try to see the value of (int) 3.0 with the test program:
int main(){
double x=3.0;
printf ("%d", ((int) x));
}
I get 3 as output, which makes the earlier ` x= 1' even more hard to understand. Any idea? For information, my gcc is run with ANSI C standard. Thanks.
[EDIT] If I use gcc -S as suggested by JS1,
I get goo.s
.file "goo.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movabsq $4613937818241073152, %rax
movq %rax, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, -24(%rbp)
movsd -24(%rbp), %xmm0
call foo
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
and foo.s
.file "foo.c"
.section .rodata
.LC0:
.string "in foo.c:: x= %d\n"
.text
.globl foo
.type foo, #function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
Anyone who knows how to read Assembly can help figure out the source problem?
Understanding why you get '1' requires a bit of ASM and x86-64 ABI
knowledge. First of all, goo.c and foo.c are two separate compilation
units. The only thing that foo.c knows about the foo function is
the bogus prototype.
The bogus prototype is as follows: void foo(double);. It's a function
that takes only a single double argument. The x86-64 ABI mandates that
the doubles are passed through the xmm registers (The exact phrasing
is 'If the class is SSE, the next available vector register is used,
the registers are taken in the order from %xmm0 to %xmm7.'.
That means that when the compiler sets up the arguments to call the
foo() function, it's going to pass the argument via %xmm0. In
simplified asm what happens is:
mov 3.0, %xmm0
call foo
Now, foo(), on it's side, believes it's going to recieve an int. The
x86-64 ABI says: 'If the class is INTEGER, the next available register
of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.'. The first
argument is supposed to be passed via %rdi. That means that foo()
will do something like:
mov %rdi, %rsi
mov 0xabcd, %rdi // 0xabcd being the address of the "%d" string
call printf
So you're going to end up printing whatever was in %rsi, and not %xmm0.
But why 1? You'll get an idea by issuing the following commands:
./a.out a
./a.out a b
./a.out a b c
See a pattern? Let's go back to the simplified assembly:
main:
mov 3.0, %xmm0
call foo
ret
foo:
mov %rdi, %rsi
mov 0xabcd, %rdi // 0xabcd being the address of the "%d" string
call printf
ret
As you can see, nothing is setting %rdi until it reaches foo(),
where it's passed on to printf. Which means 1 was passed to main
in the first place. Now, in the question, main is given the following
prototype: int main(). But the compiler actually setup the function to
have the following prototype instead: int main (int argc, char *argv[],
char *envp[]). The first argument, thus stored in %rdi, is actually
argc. That's why the program was printing 1.
hello_world.c
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
Running gcc hello_world.c -S generates a hello_world.s file in assembly language.
hello_world.s
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
Is there some way to find out in what type of assembly language the code was generated in (besides knowing the syntax of all assembly languages.)?
Reference for myself or anyone else who didn't know this:
To get your processor architecture run the following:
uname -p
It is the AT&T syntax for the GNU assembler of the target code's CPU by default. There are options to alter that.
Why does gcc take a long time to compile a C code if it has a big array in the extern block?
#define MAXNITEMS 100000000
int buff[MAXNITEMS];
int main (int argc, char *argv[])
{
return 0;
}
I suspect a bug somewhere. There is no reason for the compile to take longer, no matter how big the array is since the compiler will just write an integer into the .bss segment since you never assign a value to an element in it. Proof:
.file "big.c"
.comm buff,4000000000000000000,32
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits
As you can see, the only thing left of the array in the assembly is .comm buff,4000000000000000000,32.
I suggest you gcc with -S to see the assembler code. Maybe your version of GCC has bug. I tested with GCC 4.7.3 and the compile times here are the same, no matter which value I use.
Related: Where are static variables stored (in C/C++)?