I am in the beginning of learning intel's x86 assembly code and compiled this simple "hello world" c program (without the cfi additions for simplicity):
#include
int main(int argc, char* argv[]) {
printf("hello world!");
return 0;
}
The following x86 code came out:
.file "helloworld.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "hello world!"
.text
.globl main
.type main, #function
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR -4[rbp], edi
mov QWORD PTR -16[rbp], rsi
lea rdi, .LC0[rip]
mov eax, 0
call printf#PLT
mov eax, 0
leave
ret
.size main, .-main
.ident "GCC: (Debian 7.2.0-19) 7.2.0"
.section .note.GNU-stack,"",#progbits
The question: Why are those 16 bytes for local variables reserved on the stack but aren't used in any way? The program even does the same, without those lines, so for which reason were they created?
Related
Following code compile and run on GCC compiler.
#include <stdio.h>
int arr[10];
int func()
{
printf("In func\n");
return 0;
}
int main()
{
if (&arr[func()])
printf("In main\n");
return 0;
}
Output:
In main
Why does not execute printf("In func\n"); ?
There seems to be a subtle issue, either intended, or unintended with various combinations of the latest gcc. ver 7.3 on the latest kernel 4.15.8 on Archlinux. For whatever reason the call to func() is omitted for the code generated for main(). e.g.
$ gcc -S -masm=intel -o infunc2.asm infunc2.c
The generated assembly is:
$ cat infunc2.asm
.file "infunc2.c"
.intel_syntax noprefix
.text
.comm arr,40,32
.section .rodata
.LC0:
.string "In func"
.text
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC0[rip]
call puts#PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
.section .rodata
.LC1:
.string "In main"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC1[rip]
call puts#PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",#progbits
Note the call to func() is labeled .LFB0: above. The procedure for main: does not call func or .LFB0: at all, despite it being present, and despite the "In func" string being present in .LC0:. I suspect this is not intended behavior.
For example, simple compilation without optimization -O0 the function is not called, e.g.:
$ gcc -g -O0 -o bin/if2 infunc2.c
$ ./bin/if2
In main
Changing the code to store the address of arr[func()] does force func() to be called, e.g.
#include <stdio.h>
int arr[10];
int func()
{
printf ("In func\n");
return 0;
}
int main (void)
{
int *p = &arr[func()];
if (p)
printf("In main\n");
return 0;
}
Then
$ gcc -Wall -Wextra -pedantic -std=gnu11 -Ofast -o bin/infunc infunc.c
$ ./bin/infunc
In func
In main
And the generated assembly supports the different behavior:
$ gcc -S -masm=intel -o infunc.asm infunc.c
$ cat infunc.asm
.file "infunc.c"
.intel_syntax noprefix
.text
.comm arr,40,32
.section .rodata
.LC0:
.string "In func"
.text
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC0[rip]
call puts#PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
.section .rodata
.LC1:
.string "In main"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 16
mov eax, 0
call func
cdqe
lea rdx, 0[0+rax*4]
lea rax, arr[rip]
add rax, rdx
mov QWORD PTR -8[rbp], rax
cmp QWORD PTR -8[rbp], 0
je .L4
lea rdi, .LC1[rip]
call puts#PLT
.L4:
mov eax, 0
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",#progbits
I wish I could provide some logical explanation for the handling here, but I can only document it. Seems we need to talk with the guys on the gcc list.
Side effects discarded in address computation inside 'if'
This seems to be a regression in gcc that will appear depending on whether an individual distro applies enough patching to mask it. It is a gcc bug in work. Bug 84607
This is a gcc bug (#84607) and has been fixed in gcc 7.3.1 or later.
The problem is with your compilation. I use gcc to compile. I compiled your file like this:
gcc main.c -o prog
./prog
In func
In main
Seems good to me. Check the procedure on how to compile with you compiler if you use a different compiler than gcc. Also I use gcc 7.3
I'm attempting to get through a book on X86 that was written using examples from Visual C++ and Visual Studio. I'm trying to convert the examples for use with gcc. After a number of problems, I finally wound up with code that would at least compile, but now I'm getting segfaults. Here's the code:
assembly.s:
.intel_syntax noprefix
.section .text
.globl CalcSum
.type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum:
// Initialize a stack frame pointer
pushq rbp
mov ebp,esp
// Load the argument values
mov eax,[ebp+8]
mov ecx,[ebp+12]
mov edx,[ebp+16]
// Calculate the sum
add eax, ecx
add eax, edx
// Restore the caller's stack frame pointer
popq rbp
ret
test.c:
#include <stdio.h>
extern int CalcSum(int a, int b, int c);
int main() {
int sum = CalcSum(5,6,7);
printf(" result: %d\n",sum);
return 0;
}
I'm using gcc -o execute test.c assembly.s to compile. If I change all the 32 bit instructions to 64 bit (i.e. ebp to rbp) it will run but give completely random output. Could anyone point out what I'm doing wrong here? Thanks!
As hinted in the comments, it's a matter of calling convention. 32-bit C functions follow the CDECL calling convention in Windows and in Linux. In 64-bit Linux you have to use the System V AMD64 ABI. The 64-bit calling convention of Windows is different. There might be specifics to use functions of the operating system.
32-bit C (GCC):
.intel_syntax noprefix
.section .text
.globl CalcSum
.type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum: // with underscore in Windows: _CalcSum
// Initialize a stack frame pointer
push ebp
mov ebp,esp
// Load the argument values
mov eax,[ebp+8]
mov ecx,[ebp+12]
mov edx,[ebp+16]
// Calculate the sum
add eax, ecx
add eax, edx
// Restore the caller's stack frame pointer
pop ebp
ret
64-bit Linux (GCC):
.intel_syntax noprefix
.section .text
.globl CalcSum
.type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum:
// Load the argument values
mov rax, rdi
add rax, rsi
add rax, rdx
ret
64-bit Windows (MingW-GCC):
.intel_syntax noprefix
.section .text
.globl CalcSum
// .type CalcSum, #function
// extern "C" int CalcSum_(int a, int b, int c)
CalcSum:
// Load the argument values
mov rax, rcx
add rax, rdx
add rax, r8
ret
I just want to be sure that this C code:
while(flag==true)
{
}
foo();
does the same as this:
while(flag==true);
foo();
; alone is a null statement in C.
In your case, {} or ; are syntactically needed, but they do the same: nothing
Related: Use of null statement in C
In addition to the other answers: It's the same thing.
But I prefer this:
while (condition)
{
}
foo();
over this:
while (condition);
foo();
because if you forget the semicolon after the while, your code will compile fine but it won't do what you expect:
while(condition) // ; forgotten here
foo();
will actually be equivalent of:
while(condition)
{
foo();
}
Yes, having an empty body of the loop is equivaled to just while(<some condition>);
Yes. A ; following a control structure (e.g., while, for, etc.) that can be followed with a block is treated as if it was followed by an empty block.
Yes, because when put semicolon after while loop statement that indicate empty body and when the condition becomes false then it goes to the immediate next statement after that loop.
Yes, they are same.
You Can Generate The assembly of the code and see for yourself that they produce the same assembly. (Using gcc filename.c -S -masm=intel -o ouputfilename)
#include<stdio.h>
int foo(void);
int main(){
int flag;
scanf("%d" , &flag);
while(flag==1);
foo();
}
int foo(void){
int x = 2;
return x*x;
}
.LC0:
.ascii "%d\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 48
.seh_stackalloc 48
.seh_endprologue
call __main
lea rax, -4[rbp]
mov rdx, rax
lea rcx, .LC0[rip]
call scanf
nop
.L2:
mov eax, DWORD PTR -4[rbp]
cmp eax, 1
je .L2
call foo
mov eax, 0
add rsp, 48
pop rbp
ret
.seh_endproc
.globl foo
.def foo; .scl 2; .type 32; .endef
.seh_proc foo
foo:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 16
.seh_stackalloc 16
.seh_endprologue
mov DWORD PTR -4[rbp], 2
mov eax, DWORD PTR -4[rbp]
imul eax, DWORD PTR -4[rbp]
add rsp, 16
pop rbp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 6.3.0"
.def scanf; .scl 2; .type 32; .endef
And When I Changed while(flag == 1); to while(flag==1){} Assembly Code Generated is :
.LC0:
.ascii "%d\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 48
.seh_stackalloc 48
.seh_endprologue
call __main
lea rax, -4[rbp]
mov rdx, rax
lea rcx, .LC0[rip]
call scanf
nop
.L2:
mov eax, DWORD PTR -4[rbp]
cmp eax, 1
je .L2
call foo
mov eax, 0
add rsp, 48
pop rbp
ret
.seh_endproc
.globl foo
.def foo; .scl 2; .type 32; .endef
.seh_proc foo
foo:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 16
.seh_stackalloc 16
.seh_endprologue
mov DWORD PTR -4[rbp], 2
mov eax, DWORD PTR -4[rbp]
imul eax, DWORD PTR -4[rbp]
add rsp, 16
pop rbp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 6.3.0"
.def scanf; .scl 2; .type 32; .endef
You can see that the relevant portion is same in both cases.
//Below Portion is same in both cases.
.L2:
mov eax, DWORD PTR -4[rbp]
cmp eax, 1
je .L2
call foo
mov eax, 0
add rsp, 48
pop rbp
ret
.seh_endproc
.globl foo
.def foo; .scl 2; .type 32; .endef
.seh_proc foo
A friend helped me come up with the following code to use inline assembly in GCC on a 64-bit Windows machine:
int main() {
char* str = "Hello World";
int ret;
asm volatile(
"call puts"
: "=a" (ret), "+c" (str)
:
: "rdx", "rdi", "rsi", "r8", "r9", "r10", "r11");
return 0;
}
After compiling with -S -masm=intel (I prefer Intel syntax), I get this assembly code:
.file "hello.c"
.intel_syntax noprefix
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Hello World\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
push rdi
.seh_pushreg rdi
push rsi
.seh_pushreg rsi
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 48
.seh_stackalloc 48
.seh_endprologue
call __main
lea rax, .LC0[rip]
mov QWORD PTR -8[rbp], rax
mov rax, QWORD PTR -8[rbp]
mov rcx, rax
/APP
# 7 "hello.c" 1
call puts
# 0 "" 2
/NO_APP
mov DWORD PTR -12[rbp], eax
mov QWORD PTR -8[rbp], rcx
mov eax, 0
add rsp, 48
pop rsi
pop rdi
pop rbp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 4.9.2"
It works, but it sure looks messy with what appears to be superfluous code. Then again, my last experience with assembly was with the 65816 back in the 80s, and it wasn't inline. Anyway, I cleaned up the code, and the following accomplishes the exact same thing, as far as I can tell:
.intel_syntax noprefix
.data:
.ascii "Hello World\0"
.text
.globl main
main:
sub rsp, 48
lea rax, .data[rip]
mov rcx, rax
call puts
mov eax, 0
add rsp, 48
ret
Much simpler. What's all that extra stuff GCC added?
Edit: Not a duplicate because in addition to the structured exception handling, I'm also asking about the callee-saved registers, the call to __main, the explicit size directives, and the APP/NO_APP section.
I am new to c and gcc. I'm trying to follow along with an example in Computer Systems: A Programmer's Perspective. The author says that the following code when put into a file (code.c)
int accum = 0;
int sum(int x, int y)
{
int t = x + y;
accum += t;
return t;
}
and using the gcc as follows to output an assembly code file
gcc -O2 -S code.c
will produce assembly code as follows
sum:
pushl %ebp
movl %esp,%ebp
movl 12(%ebp),%eax
addl 8(%ebp),%eax
addl %eax,accum
movl %ebp,%esp
popl %ebp
ret
However on my machine (OS: Ubuntu 10.4 x64) I get the following
.file "code.c"
.intel_syntax noprefix
.text
.p2align 4,,15
.globl sum
.type sum, #function
sum:
.LFB0:
.cfi_startproc
lea eax, [rdi+rsi]
add DWORD PTR accum[rip], eax
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
.globl accum
.bss
.align 4
.type accum, #object
.size accum, 4
accum:
.zero 4
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits
Why am I seeing this difference?
Because the book is 11 years old and gcc has changed a great deal since it was written.