Unknown assembly syntax: .LC0(%rip) [duplicate] - c

This question already has answers here:
what does "mov offset(%rip), %rax" do?
(2 answers)
How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work?
(1 answer)
Why is the address of static variables relative to the Instruction Pointer?
(1 answer)
Closed 1 year ago.
I have been studied assembly language based on gcc -S outputs, and i find some syntax i've not seen before.
From C code:
#include <stdio.h>
void main() {
printf("%d\n", sizeof(int));
}
I've got this:
.file "test.c"
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "%d\12\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
movl $4, %edx
leaq .LC0(%rip), %rcx
call printf
nop
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (tdm64-1) 5.1.0"
.def printf; .scl 2; .type 32; .endef
And even if this code is very clear and understandable, that one line of it is completely strange for me. The line i am talking about is:
leaq .LC0(%rip), %rcx
And even if i know that leaq is instruction for loading effective address, that the operand syntax is unclear for me, i mean label of formatting string with instruction pointer as parameter? What does it do?
Thanks in advance :)

Related

Why does my empty loop run twice as fast if called as a function, on Intel Skylake CPUs?

I was running some tests to compare C to Java and ran into something interesting. Running my exactly identical benchmark code with optimization level 1 (-O1) in a function called by main, rather than in main itself, resulted in roughly double performance. I'm printing out the size of test_t to verify beyond any doubt that the code is being compiled to x64.
I sent the executables to my friend who's running an i7-7700HQ and got similar results. I'm running an i7-6700.
Here's the slower code:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
int main() {
printf("Size = %I64u\n", sizeof(size_t));
int start = clock();
for(int64_t i = 0; i < 10000000000L; i++) {
}
printf("%ld\n", clock() - start);
return 0;
}
And the faster:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
void test() {
printf("Size = %I64u\n", sizeof(size_t));
int start = clock();
for(int64_t i = 0; i < 10000000000L; i++) {
}
printf("%ld\n", clock() - start);
}
int main() {
test();
return 0;
}
I'll also provide the assembly code for you to dig in to. I don't know assembly.
Slower:
.file "dummy.c"
.text
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Size = %I64u\12\0"
.LC1:
.ascii "%ld\12\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbx
.seh_pushreg %rbx
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
movl $8, %edx
leaq .LC0(%rip), %rcx
call printf
call clock
movl %eax, %ebx
movabsq $10000000000, %rax
.L2:
subq $1, %rax
jne .L2
call clock
subl %ebx, %eax
movl %eax, %edx
leaq .LC1(%rip), %rcx
call printf
movl $0, %eax
addq $32, %rsp
popq %rbx
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def printf; .scl 2; .type 32; .endef
.def clock; .scl 2; .type 32; .endef
Faster:
.file "dummy.c"
.text
.section .rdata,"dr"
.LC0:
.ascii "Size = %I64u\12\0"
.LC1:
.ascii "%ld\12\0"
.text
.globl test
.def test; .scl 2; .type 32; .endef
.seh_proc test
test:
pushq %rbx
.seh_pushreg %rbx
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
movl $8, %edx
leaq .LC0(%rip), %rcx
call printf
call clock
movl %eax, %ebx
movabsq $10000000000, %rax
.L2:
subq $1, %rax
jne .L2
call clock
subl %ebx, %eax
movl %eax, %edx
leaq .LC1(%rip), %rcx
call printf
nop
addq $32, %rsp
popq %rbx
ret
.seh_endproc
.def __main; .scl 2; .type 32; .endef
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
call test
movl $0, %eax
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def printf; .scl 2; .type 32; .endef
.def clock; .scl 2; .type 32; .endef
Here's my batch script for compilation:
#echo off
set /p file= File to compile:
del compiled.exe
gcc -Wall -Wextra -std=c17 -O1 -o compiled.exe %file%.c
compiled.exe
PAUSE
And for compilation to assembly:
#echo off
set /p file= File to compile:
del %file%.s
gcc -S -Wall -Wextra -std=c17 -O1 %file%.c
PAUSE
The slow version:
Note that the sub rax, 1 \ jne pair goes right across the boundary of the ..80 (which is a 32byte boundary). This is one of the cases mentioned in Intels document regarding this issue namely as this diagram:
So this op/branch pair is affected by the fix for the JCC erratum (which would cause it to not be cached in the µop cache). I'm not sure if that is the reason, there are other things at play too, but it's a thing.
In the fast version, the branch is not "touching" a 32byte boundary, so it is not affected.
There may be other effects that apply. Still due to crossing a 32byte boundary, in the slow case the loop is spread across 2 chunks in the µop cache, even without the fix for JCC erratum that may cause it to run at 2 cycles per iteration if the loop cannot execute from the Loop Stream Detector (which is disabled on some processors by an other fix for an other erratum, SKL150). See eg this answer about loop performance.
To address the various comments saying they cannot reproduce this, yes there are various ways that could happen:
Whichever effect was responsible for the slowdown, it is likely caused by the exact placement of the op/branch pair across a 32byte boundary, which happened by pure accident. Compiling from source is unlikely to reproduce the same circumstances, unless you use the same compiler with the same setup as was used by the original poster.
Even using the same binary, regardless of which of the effects is responsible, the weird effect would only happen on particular processors.

inline vs static inline c

Here are some simple tests run on a x86_64 to show assembler code generated when using inline statement :
TEST 1
static inline void
show_text(void)
{
printf("Hello\n");
}
int main(int argc, char *argv[])
{
show_text();
return 0;
}
And assembler :
gcc -O0 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.type show_text, #function
show_text:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
ret
.size show_text, .-show_text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call show_text
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 1 result : inline suggestion not taken into account by compiler
Test 2
Same code as test 1, but with -O1 optimization flag
gcc -O1 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Hello"
.text
.globl main
.type main, #function
main:
subq $8, %rsp
leaq .LC0(%rip), %rdi
call puts#PLT
movl $0, %eax
addq $8, %rsp
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 2 result : no more show_text function defined in assembler
Test 3
show_text not declared as inline, -O1 optimization flag
Test 3 result : no more show_text function defined in assembler, with or without inline : same generated code
Test 4
#include <stdio.h>
static inline void
show_text(void)
{
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
}
int main(int argc, char *argv[])
{
show_text();
show_text();
return 0;
}
produces :
gcc -O1 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.type show_text, #function
show_text:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
ret
.size show_text, .-show_text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call show_text
call show_text
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 4 result : show_text defined in assembler, inline suggestion not taken into account
I understand inline keyword does not force inlining. But for Test 1 results, what can prevent show_text code replacement in main?
So far, I used to inline some small static functions in my C source code. But from these results it seems quite useless.
Why should I declare some of my small functions static inline when using some modern compilers (and possibly compiling optimized code)?
It is one of those questionable decisions of the C Language Standards people... use of inline does not guarantee a function to be inlined... the keyword only suggests to the compiler that the function could be inlined.
I've had lengthy exchanges on this topic with the ISO WG; this followed a MISRA guideline that requires all inline functions to be declared at module scope using the static keyword. Their logic is that there may be circumstances where the compiler needs to not inline the function... and equally, there may be cases where that non-inlined function needs to have global scope!
IMHO, if a programmer adds the inline keyword, then the suggestion is that they know what they are doing, and that function should be inline.
As you suggest, in its current form, the inline keyword is effectively pointless, unless a compiler treats it seriously.
In your first test you disable optimizations. Inlining is an optimization method. Do not expect it to happen.
Also inline keyword doesn't work nowadays as it used to in the past. I'd say it's only purpose is to have functions in headers without having linker errors about duplicated symbols (when more than one cpp file uses such a header).
Let your compiler do its work. Just enable optimizations (including LTO) and do not worry about details.

Address of global variables in x86 assembly

I have a C code that declares global variable char file[MAX]. This variable is used in various functions directly to copy filename to it. I can compile this c file to assembly code but I don't know how to find the address of this variable? In x86 stack, how do I find the address of a global variable? Can you give me an example how global variable is referenced in assembly code?
EDIT: I don't see a .Data segment in the assembly code.
To store the address of file to register EAX:
AT&T syntax: movl $_file, %eax
intel syntax: mov eax, OFFSET _file
How to examine:
Firstly, write a simple code (test.c).
#define MAX 256
char file[MAX];
int main(void) {
volatile char *address = file;
return 0;
}
Then, compile it to asssembly code: gcc -S -O0 -o test.s test.c
.file "test.c"
.comm _file, 256, 5
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $_file, 12(%esp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.1"
Or if you want intel syntax: gcc -S -O0 -masm=intel -o test_intel.s test.c
.file "test.c"
.intel_syntax noprefix
.comm _file, 256, 5
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
and esp, -16
sub esp, 16
call ___main
mov DWORD PTR [esp+12], OFFSET FLAT:_file
mov eax, 0
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.1"
With a little more experiments and examination, I got the result.

Why does GCC not add .section into the assembly

If you look at the second line of this program it just says ".text". When I write assembly programs I though that you had to put ".section .text" Why does GCC omit the ".section". I also noticed that it includes it before declaring rodata bellow ".section .rodata".
Also just wondering what ".type sum, #function" does? I wrote an assembly function this morning without it and it executed fine.
.file "test.c"
.text
.globl sum
.type sum, #function
sum:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movss %xmm0, -4(%rbp)
movss %xmm1, -8(%rbp)
movss -4(%rbp), %xmm0
mulss -8(%rbp), %xmm0
cvttss2si %xmm0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
.section .rodata
.LC2:
.string "%d\n"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movss .LC0(%rip), %xmm1
movss .LC1(%rip), %xmm0
call sum
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC2, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.section .rodata
.align 4
.LC0:
.long 1092930765
.align 4
.LC1:
.long 1092825907
.ident "GCC: (Ubuntu 4.9.2-10ubuntu13) 4.9.2"
.section .note.GNU-stack,"",#progbits
Collecting up some comments into an answer:
Before arbitrary section names were possible, .text, .data, and .bss were assembler directives. Now, you can write .section .text instead. This should all be documented in the GNU as manual. (linked to latest version).
.type sum, #function
sets some ELF symbol-type stuff. IDK if this matters for dynamic linking, but it doesn't for static linkage. There's a lot of stuff the compiler emits but that you don't actually need for your code to run. This is not a bad thing.
For the other things in gcc asm output, have a look at my answer to GCC Assembly Optimizations - Why are these equivalent?

assembly language - c-language and mnemonics

I wrote a simple program in c-language, the classic helloworld. I wanted to know how it looked liked when the compiler translated it to assembly code.
I use MinGW and the command:
gcc -S hellow.c
When I opened this file I expected it would, AT THE LEAST, be somewhat similar to a hello-world program written directly in assembly, that is:
jmp 115
db 'Hello world!$' (db = define bytes)
-a 115
mov ah, 09 (09 for displaying strings ... ah = 'command register')
mov dx, 102 (adress of the string)
int 21
int 20
Instead it look like this:
.file "hellow.c"
.def ___main;
.scl 2;
.type 32;
.endef
.section
.rdata,"dr"
LC0:
.ascii "Hello world!\0"
.text
.globl _main
.def _main;
.scl 2;
.type 32;
.endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $LC0, (%esp)
call _puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _puts;
.scl 2;
.type 32;
.endef
I know litte about assembly language, but i DO recognice the so called mnemonics like ADD, POP, PUSH, MOV, JMP, INT etc. Could not see much of these in the code generated by the c-compiler.
What did I missunderstand?
This prepares the arguments to call a function __main that probably does all initial setup that is needed for a C program
andl $-16, %esp
subl $16, %esp
call ___main
This prepares the arguments and calls function _puts. LC0 is a symbol that contains the string to be printed.
movl $LC0, (%esp)
call _puts
This prepares the return value of main and returns
movl $0, %eax
leave
ret
Your example code uses Intel syntax, while the standard output from gcc is AT&T syntax. You can change that by using
gcc -S hellow.c -masm=intel
The resulting output should look more familiar.
However, if the compiler generates the source then it looks rather different, then what you would write by hand.
The int would be used if you compile for DOS, but even so, these calls would be wrapped in C standard functions, like puts in this case.

Resources