I want to create some assembly code with gcc. When I use gcc -masm=intel -S test.c I get assembly code full of .def and .cfi labels which I cannot assemble. Is there a way to create assembly code without this labels?
E.g.: A simple c code like:
int main() {
return 0;
}
Compiles to:
.file "test.c"
.intel_syntax noprefix
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
and esp, -16
call ___main
mov eax, 0
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
But what I want is something like:
_main:
push ebp
mov ebp, esp
and esp, -16
call ___main
mov eax, 0
leave
ret
I hope there's a way to do this. Thanks in advanced.
Related
I'm trying to add a function S_0x804853E in an assembly file compiled by GCC. And i'm trying to assemble the file to execuable file. The complete assembly file is followed.
.file "simple.c"
.intel_syntax noprefix
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
sub esp, 16
call __x86.get_pc_thunk.ax
add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
mov DWORD PTR -4[ebp], 3
mov eax, 0
leave
call S_0x804853E # note that this line is manually added.
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.__x86.get_pc_thunk.ax,"axG",#progbits,__x86.get_pc_thunk.ax,comdat
.globl __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type __x86.get_pc_thunk.ax, #function
__x86.get_pc_thunk.ax:
.LFB1:
.cfi_startproc
mov eax, DWORD PTR [esp]
ret
.cfi_endproc
.LFE1:
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
# note that codes below are manually added.
.type S_0x804853E, #function
S_0x804853E:
push ebp
mov esp,ebp
push ebx
sub $0x4,esp
call S_0x80485BB
add $_GLOBAL_OFFSET_TABLE_,eax
sub $0xC,esp
lea S_0x80486B8,edx
push edx
mov eax,ebx
call puts
add $0x10,esp
nop
mov -0x4(ebp),ebx
leave
ret
.type S_0x80485BB, #function
S_0x80485BB:
mov (esp),eax
ret
.section .rodata
S_0x80486B8:
.byte 0x36
.byte 0x00
I'm using commands below to assemble. And Errors followed.
$ gcc -m32 -no-pie -nostartfiles simple.s -o simple
simple.s: Assembler messages:
simple.s:49: Error: operand size mismatch for `lea'
simple.s:55: Error: junk `(ebp)' after expression
I'm not very familiar with assembly. Apologize if the problem can be easily solved by google. But i failed to find any related explanations. Thanks for your help.
The main problem is that i mixed up the grammar of intel and AT&T. The codes generated from the tool are AT&T without operator suffix('b','l','w','q').
Compiling C code to AT&T and making up the operator suffix make sense. edited codes followed.
.file "simple.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
movl $3, -4(%ebp)
movl $0, %eax
leave
call S_0x804853E # note that this line is mannally added
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.__x86.get_pc_thunk.ax,"axG",#progbits,__x86.get_pc_thunk.ax,comdat
.globl __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type __x86.get_pc_thunk.ax, #function
__x86.get_pc_thunk.ax:
.LFB1:
.cfi_startproc
movl (%esp), %eax
ret
.cfi_endproc
# note that codes below are mannally added
.type S_0x804853E, #function
S_0x804853E:
pushl %ebp
movl %esp,%ebp
pushl %ebx
subl $0x4,%esp
call S_0x80485BB
addl $_GLOBAL_OFFSET_TABLE_,%eax
subl $0xC,%esp
lea S_0x80486B8,%edx
pushl %edx
movl %eax,%ebx
call puts
addl $0x10,%esp
nop
movl -0x4(%ebp),%ebx
leave
ret
.type S_0x80485BB, #function
S_0x80485BB:
movl (%esp),%eax
ret
.section .rodata
S_0x80486B8:
.byte 0x36
.byte 0x00
Codes can be assembled by gcc without warnings and errors.
-------------------------split line for new edit----------------------
Thanks for help from #Peter Cordes.
It's unnecessary to explictly give all instructions the operand-size suffix. We use suffix only if the operand size of the instuction seems ambiguous without the declaration of size.
EX:neither operand is a register.
movl $4, -4(%ebp)
Following code compile and run on GCC compiler.
#include <stdio.h>
int arr[10];
int func()
{
printf("In func\n");
return 0;
}
int main()
{
if (&arr[func()])
printf("In main\n");
return 0;
}
Output:
In main
Why does not execute printf("In func\n"); ?
There seems to be a subtle issue, either intended, or unintended with various combinations of the latest gcc. ver 7.3 on the latest kernel 4.15.8 on Archlinux. For whatever reason the call to func() is omitted for the code generated for main(). e.g.
$ gcc -S -masm=intel -o infunc2.asm infunc2.c
The generated assembly is:
$ cat infunc2.asm
.file "infunc2.c"
.intel_syntax noprefix
.text
.comm arr,40,32
.section .rodata
.LC0:
.string "In func"
.text
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC0[rip]
call puts#PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
.section .rodata
.LC1:
.string "In main"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC1[rip]
call puts#PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",#progbits
Note the call to func() is labeled .LFB0: above. The procedure for main: does not call func or .LFB0: at all, despite it being present, and despite the "In func" string being present in .LC0:. I suspect this is not intended behavior.
For example, simple compilation without optimization -O0 the function is not called, e.g.:
$ gcc -g -O0 -o bin/if2 infunc2.c
$ ./bin/if2
In main
Changing the code to store the address of arr[func()] does force func() to be called, e.g.
#include <stdio.h>
int arr[10];
int func()
{
printf ("In func\n");
return 0;
}
int main (void)
{
int *p = &arr[func()];
if (p)
printf("In main\n");
return 0;
}
Then
$ gcc -Wall -Wextra -pedantic -std=gnu11 -Ofast -o bin/infunc infunc.c
$ ./bin/infunc
In func
In main
And the generated assembly supports the different behavior:
$ gcc -S -masm=intel -o infunc.asm infunc.c
$ cat infunc.asm
.file "infunc.c"
.intel_syntax noprefix
.text
.comm arr,40,32
.section .rodata
.LC0:
.string "In func"
.text
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC0[rip]
call puts#PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
.section .rodata
.LC1:
.string "In main"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 16
mov eax, 0
call func
cdqe
lea rdx, 0[0+rax*4]
lea rax, arr[rip]
add rax, rdx
mov QWORD PTR -8[rbp], rax
cmp QWORD PTR -8[rbp], 0
je .L4
lea rdi, .LC1[rip]
call puts#PLT
.L4:
mov eax, 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",#progbits
I wish I could provide some logical explanation for the handling here, but I can only document it. Seems we need to talk with the guys on the gcc list.
Side effects discarded in address computation inside 'if'
This seems to be a regression in gcc that will appear depending on whether an individual distro applies enough patching to mask it. It is a gcc bug in work. Bug 84607
This is a gcc bug (#84607) and has been fixed in gcc 7.3.1 or later.
The problem is with your compilation. I use gcc to compile. I compiled your file like this:
gcc main.c -o prog
./prog
In func
In main
Seems good to me. Check the procedure on how to compile with you compiler if you use a different compiler than gcc. Also I use gcc 7.3
This question already has answers here:
what does "mov offset(%rip), %rax" do?
(2 answers)
How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work?
(1 answer)
Why is the address of static variables relative to the Instruction Pointer?
(1 answer)
Closed 1 year ago.
I have been studied assembly language based on gcc -S outputs, and i find some syntax i've not seen before.
From C code:
#include <stdio.h>
void main() {
printf("%d\n", sizeof(int));
}
I've got this:
.file "test.c"
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "%d\12\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
movl $4, %edx
leaq .LC0(%rip), %rcx
call printf
nop
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (tdm64-1) 5.1.0"
.def printf; .scl 2; .type 32; .endef
And even if this code is very clear and understandable, that one line of it is completely strange for me. The line i am talking about is:
leaq .LC0(%rip), %rcx
And even if i know that leaq is instruction for loading effective address, that the operand syntax is unclear for me, i mean label of formatting string with instruction pointer as parameter? What does it do?
Thanks in advance :)
I have a C code that declares global variable char file[MAX]. This variable is used in various functions directly to copy filename to it. I can compile this c file to assembly code but I don't know how to find the address of this variable? In x86 stack, how do I find the address of a global variable? Can you give me an example how global variable is referenced in assembly code?
EDIT: I don't see a .Data segment in the assembly code.
To store the address of file to register EAX:
AT&T syntax: movl $_file, %eax
intel syntax: mov eax, OFFSET _file
How to examine:
Firstly, write a simple code (test.c).
#define MAX 256
char file[MAX];
int main(void) {
volatile char *address = file;
return 0;
}
Then, compile it to asssembly code: gcc -S -O0 -o test.s test.c
.file "test.c"
.comm _file, 256, 5
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $_file, 12(%esp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.1"
Or if you want intel syntax: gcc -S -O0 -masm=intel -o test_intel.s test.c
.file "test.c"
.intel_syntax noprefix
.comm _file, 256, 5
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
and esp, -16
sub esp, 16
call ___main
mov DWORD PTR [esp+12], OFFSET FLAT:_file
mov eax, 0
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.1"
With a little more experiments and examination, I got the result.
I wrote a simple program in c-language, the classic helloworld. I wanted to know how it looked liked when the compiler translated it to assembly code.
I use MinGW and the command:
gcc -S hellow.c
When I opened this file I expected it would, AT THE LEAST, be somewhat similar to a hello-world program written directly in assembly, that is:
jmp 115
db 'Hello world!$' (db = define bytes)
-a 115
mov ah, 09 (09 for displaying strings ... ah = 'command register')
mov dx, 102 (adress of the string)
int 21
int 20
Instead it look like this:
.file "hellow.c"
.def ___main;
.scl 2;
.type 32;
.endef
.section
.rdata,"dr"
LC0:
.ascii "Hello world!\0"
.text
.globl _main
.def _main;
.scl 2;
.type 32;
.endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $LC0, (%esp)
call _puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _puts;
.scl 2;
.type 32;
.endef
I know litte about assembly language, but i DO recognice the so called mnemonics like ADD, POP, PUSH, MOV, JMP, INT etc. Could not see much of these in the code generated by the c-compiler.
What did I missunderstand?
This prepares the arguments to call a function __main that probably does all initial setup that is needed for a C program
andl $-16, %esp
subl $16, %esp
call ___main
This prepares the arguments and calls function _puts. LC0 is a symbol that contains the string to be printed.
movl $LC0, (%esp)
call _puts
This prepares the return value of main and returns
movl $0, %eax
leave
ret
Your example code uses Intel syntax, while the standard output from gcc is AT&T syntax. You can change that by using
gcc -S hellow.c -masm=intel
The resulting output should look more familiar.
However, if the compiler generates the source then it looks rather different, then what you would write by hand.
The int would be used if you compile for DOS, but even so, these calls would be wrapped in C standard functions, like puts in this case.