Address of global variables in x86 assembly

Address of global variables in x86 assembly - c

I have a C code that declares global variable char file[MAX]. This variable is used in various functions directly to copy filename to it. I can compile this c file to assembly code but I don't know how to find the address of this variable? In x86 stack, how do I find the address of a global variable? Can you give me an example how global variable is referenced in assembly code?
EDIT: I don't see a .Data segment in the assembly code.

To store the address of file to register EAX:
AT&T syntax: movl $_file, %eax
intel syntax: mov eax, OFFSET _file
How to examine:
Firstly, write a simple code (test.c).
#define MAX 256
char file[MAX];
int main(void) {
volatile char *address = file;
return 0;
}
Then, compile it to asssembly code: gcc -S -O0 -o test.s test.c
.file "test.c"
.comm _file, 256, 5
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $_file, 12(%esp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.1"
Or if you want intel syntax: gcc -S -O0 -masm=intel -o test_intel.s test.c
.file "test.c"
.intel_syntax noprefix
.comm _file, 256, 5
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
and esp, -16
sub esp, 16
call ___main
mov DWORD PTR [esp+12], OFFSET FLAT:_file
mov eax, 0
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.1"
With a little more experiments and examination, I got the result.

Related

Does necessary content of same header will be duplicated in produced executable?

I use gcc. We know the cpp expands all macros definitions and include statements and passes the result to the actual compiler to create an executable file. I tested the result of cpp and see that some parts of header which are necessary, are included in output of cpp for each source file.
But I want to know if in a project I have a header that is included in multiple source files, then does the related content to that header will be duplicated multiple times in produced executable? Or there will be multiple shortcuts to them? If there will be shortcuts, I want to know why cpp replaces shortcuts of source files with more code and pass the result to cc1?
For example in a header, I have a function with name kids
int kids(){
return 1;
}
and call it in test.c source file. The cpp puts the definition of kids in its output. But in compiled file (test.s which is the result of cc -S test.c), I only see call kids which is replaced instead of function definition. I call that a shortcut.I want to know what will happen for that function and its calls in executable?
Edit:
The assembly off test.c is:
.file "test.c"
.text
.globl kids
.type kids, #function
kids:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $1, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size kids, .-kids
.section .rodata
.LC0:
.string "C Rocks + Ali!"
.LC1:
.string "%d"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
call puts#PLT
movl $0, %eax
call kids
movl %eax, %esi
leaq .LC1(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:

Error: operand size mismatch for `lea' (seems like syntax error)

I'm trying to add a function S_0x804853E in an assembly file compiled by GCC. And i'm trying to assemble the file to execuable file. The complete assembly file is followed.
.file "simple.c"
.intel_syntax noprefix
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
sub esp, 16
call __x86.get_pc_thunk.ax
add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
mov DWORD PTR -4[ebp], 3
mov eax, 0
leave
call S_0x804853E # note that this line is manually added.
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.__x86.get_pc_thunk.ax,"axG",#progbits,__x86.get_pc_thunk.ax,comdat
.globl __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type __x86.get_pc_thunk.ax, #function
__x86.get_pc_thunk.ax:
.LFB1:
.cfi_startproc
mov eax, DWORD PTR [esp]
ret
.cfi_endproc
.LFE1:
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
# note that codes below are manually added.
.type S_0x804853E, #function
S_0x804853E:
push ebp
mov esp,ebp
push ebx
sub $0x4,esp
call S_0x80485BB
add $_GLOBAL_OFFSET_TABLE_,eax
sub $0xC,esp
lea S_0x80486B8,edx
push edx
mov eax,ebx
call puts
add $0x10,esp
nop
mov -0x4(ebp),ebx
leave
ret
.type S_0x80485BB, #function
S_0x80485BB:
mov (esp),eax
ret
.section .rodata
S_0x80486B8:
.byte 0x36
.byte 0x00
I'm using commands below to assemble. And Errors followed.
$ gcc -m32 -no-pie -nostartfiles simple.s -o simple
simple.s: Assembler messages:
simple.s:49: Error: operand size mismatch for `lea'
simple.s:55: Error: junk `(ebp)' after expression
I'm not very familiar with assembly. Apologize if the problem can be easily solved by google. But i failed to find any related explanations. Thanks for your help.

The main problem is that i mixed up the grammar of intel and AT&T. The codes generated from the tool are AT&T without operator suffix('b','l','w','q').
Compiling C code to AT&T and making up the operator suffix make sense. edited codes followed.
.file "simple.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
movl $3, -4(%ebp)
movl $0, %eax
leave
call S_0x804853E # note that this line is mannally added
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.__x86.get_pc_thunk.ax,"axG",#progbits,__x86.get_pc_thunk.ax,comdat
.globl __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type __x86.get_pc_thunk.ax, #function
__x86.get_pc_thunk.ax:
.LFB1:
.cfi_startproc
movl (%esp), %eax
ret
.cfi_endproc
# note that codes below are mannally added
.type S_0x804853E, #function
S_0x804853E:
pushl %ebp
movl %esp,%ebp
pushl %ebx
subl $0x4,%esp
call S_0x80485BB
addl $_GLOBAL_OFFSET_TABLE_,%eax
subl $0xC,%esp
lea S_0x80486B8,%edx
pushl %edx
movl %eax,%ebx
call puts
addl $0x10,%esp
nop
movl -0x4(%ebp),%ebx
leave
ret
.type S_0x80485BB, #function
S_0x80485BB:
movl (%esp),%eax
ret
.section .rodata
S_0x80486B8:
.byte 0x36
.byte 0x00
Codes can be assembled by gcc without warnings and errors.
-------------------------split line for new edit----------------------
Thanks for help from #Peter Cordes.
It's unnecessary to explictly give all instructions the operand-size suffix. We use suffix only if the operand size of the instuction seems ambiguous without the declaration of size.
EX:neither operand is a register.
movl $4, -4(%ebp)

Why does my empty loop run twice as fast if called as a function, on Intel Skylake CPUs?

I was running some tests to compare C to Java and ran into something interesting. Running my exactly identical benchmark code with optimization level 1 (-O1) in a function called by main, rather than in main itself, resulted in roughly double performance. I'm printing out the size of test_t to verify beyond any doubt that the code is being compiled to x64.
I sent the executables to my friend who's running an i7-7700HQ and got similar results. I'm running an i7-6700.
Here's the slower code:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
int main() {
printf("Size = %I64u\n", sizeof(size_t));
int start = clock();
for(int64_t i = 0; i < 10000000000L; i++) {
}
printf("%ld\n", clock() - start);
return 0;
}
And the faster:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
void test() {
printf("Size = %I64u\n", sizeof(size_t));
int start = clock();
for(int64_t i = 0; i < 10000000000L; i++) {
}
printf("%ld\n", clock() - start);
}
int main() {
test();
return 0;
}
I'll also provide the assembly code for you to dig in to. I don't know assembly.
Slower:
.file "dummy.c"
.text
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Size = %I64u\12\0"
.LC1:
.ascii "%ld\12\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbx
.seh_pushreg %rbx
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
movl $8, %edx
leaq .LC0(%rip), %rcx
call printf
call clock
movl %eax, %ebx
movabsq $10000000000, %rax
.L2:
subq $1, %rax
jne .L2
call clock
subl %ebx, %eax
movl %eax, %edx
leaq .LC1(%rip), %rcx
call printf
movl $0, %eax
addq $32, %rsp
popq %rbx
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def printf; .scl 2; .type 32; .endef
.def clock; .scl 2; .type 32; .endef
Faster:
.file "dummy.c"
.text
.section .rdata,"dr"
.LC0:
.ascii "Size = %I64u\12\0"
.LC1:
.ascii "%ld\12\0"
.text
.globl test
.def test; .scl 2; .type 32; .endef
.seh_proc test
test:
pushq %rbx
.seh_pushreg %rbx
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
movl $8, %edx
leaq .LC0(%rip), %rcx
call printf
call clock
movl %eax, %ebx
movabsq $10000000000, %rax
.L2:
subq $1, %rax
jne .L2
call clock
subl %ebx, %eax
movl %eax, %edx
leaq .LC1(%rip), %rcx
call printf
nop
addq $32, %rsp
popq %rbx
ret
.seh_endproc
.def __main; .scl 2; .type 32; .endef
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
call test
movl $0, %eax
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def printf; .scl 2; .type 32; .endef
.def clock; .scl 2; .type 32; .endef
Here's my batch script for compilation:
#echo off
set /p file= File to compile:
del compiled.exe
gcc -Wall -Wextra -std=c17 -O1 -o compiled.exe %file%.c
compiled.exe
PAUSE
And for compilation to assembly:
#echo off
set /p file= File to compile:
del %file%.s
gcc -S -Wall -Wextra -std=c17 -O1 %file%.c
PAUSE

The slow version:
Note that the sub rax, 1 \ jne pair goes right across the boundary of the ..80 (which is a 32byte boundary). This is one of the cases mentioned in Intels document regarding this issue namely as this diagram:
So this op/branch pair is affected by the fix for the JCC erratum (which would cause it to not be cached in the µop cache). I'm not sure if that is the reason, there are other things at play too, but it's a thing.
In the fast version, the branch is not "touching" a 32byte boundary, so it is not affected.
There may be other effects that apply. Still due to crossing a 32byte boundary, in the slow case the loop is spread across 2 chunks in the µop cache, even without the fix for JCC erratum that may cause it to run at 2 cycles per iteration if the loop cannot execute from the Loop Stream Detector (which is disabled on some processors by an other fix for an other erratum, SKL150). See eg this answer about loop performance.
To address the various comments saying they cannot reproduce this, yes there are various ways that could happen:
Whichever effect was responsible for the slowdown, it is likely caused by the exact placement of the op/branch pair across a 32byte boundary, which happened by pure accident. Compiling from source is unlikely to reproduce the same circumstances, unless you use the same compiler with the same setup as was used by the original poster.
Even using the same binary, regardless of which of the effects is responsible, the weird effect would only happen on particular processors.

Unknown assembly syntax: .LC0(%rip) [duplicate]

This question already has answers here:
what does "mov offset(%rip), %rax" do?
(2 answers)
How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work?
(1 answer)
Why is the address of static variables relative to the Instruction Pointer?
(1 answer)
Closed 1 year ago.
I have been studied assembly language based on gcc -S outputs, and i find some syntax i've not seen before.
From C code:
#include <stdio.h>
void main() {
printf("%d\n", sizeof(int));
}
I've got this:
.file "test.c"
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "%d\12\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
movl $4, %edx
leaq .LC0(%rip), %rcx
call printf
nop
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (tdm64-1) 5.1.0"
.def printf; .scl 2; .type 32; .endef
And even if this code is very clear and understandable, that one line of it is completely strange for me. The line i am talking about is:
leaq .LC0(%rip), %rcx
And even if i know that leaq is instruction for loading effective address, that the operand syntax is unclear for me, i mean label of formatting string with instruction pointer as parameter? What does it do?
Thanks in advance :)

Create 'raw' assembly code with gcc(Intel syntax)

I want to create some assembly code with gcc. When I use gcc -masm=intel -S test.c I get assembly code full of .def and .cfi labels which I cannot assemble. Is there a way to create assembly code without this labels?
E.g.: A simple c code like:
int main() {
return 0;
}
Compiles to:
.file "test.c"
.intel_syntax noprefix
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
and esp, -16
call ___main
mov eax, 0
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
But what I want is something like:
_main:
push ebp
mov ebp, esp
and esp, -16
call ___main
mov eax, 0
leave
ret
I hope there's a way to do this. Thanks in advanced.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight