assembly language - c-language and mnemonics

assembly language - c-language and mnemonics - c

I wrote a simple program in c-language, the classic helloworld. I wanted to know how it looked liked when the compiler translated it to assembly code.
I use MinGW and the command:
gcc -S hellow.c
When I opened this file I expected it would, AT THE LEAST, be somewhat similar to a hello-world program written directly in assembly, that is:
jmp 115
db 'Hello world!$' (db = define bytes)
-a 115
mov ah, 09 (09 for displaying strings ... ah = 'command register')
mov dx, 102 (adress of the string)
int 21
int 20
Instead it look like this:
.file "hellow.c"
.def ___main;
.scl 2;
.type 32;
.endef
.section
.rdata,"dr"
LC0:
.ascii "Hello world!\0"
.text
.globl _main
.def _main;
.scl 2;
.type 32;
.endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $LC0, (%esp)
call _puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _puts;
.scl 2;
.type 32;
.endef
I know litte about assembly language, but i DO recognice the so called mnemonics like ADD, POP, PUSH, MOV, JMP, INT etc. Could not see much of these in the code generated by the c-compiler.
What did I missunderstand?

This prepares the arguments to call a function __main that probably does all initial setup that is needed for a C program
andl $-16, %esp
subl $16, %esp
call ___main
This prepares the arguments and calls function _puts. LC0 is a symbol that contains the string to be printed.
movl $LC0, (%esp)
call _puts
This prepares the return value of main and returns
movl $0, %eax
leave
ret

Your example code uses Intel syntax, while the standard output from gcc is AT&T syntax. You can change that by using
gcc -S hellow.c -masm=intel
The resulting output should look more familiar.
However, if the compiler generates the source then it looks rather different, then what you would write by hand.
The int would be used if you compile for DOS, but even so, these calls would be wrapped in C standard functions, like puts in this case.

Related

Why does my empty loop run twice as fast if called as a function, on Intel Skylake CPUs?

I was running some tests to compare C to Java and ran into something interesting. Running my exactly identical benchmark code with optimization level 1 (-O1) in a function called by main, rather than in main itself, resulted in roughly double performance. I'm printing out the size of test_t to verify beyond any doubt that the code is being compiled to x64.
I sent the executables to my friend who's running an i7-7700HQ and got similar results. I'm running an i7-6700.
Here's the slower code:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
int main() {
printf("Size = %I64u\n", sizeof(size_t));
int start = clock();
for(int64_t i = 0; i < 10000000000L; i++) {
}
printf("%ld\n", clock() - start);
return 0;
}
And the faster:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
void test() {
printf("Size = %I64u\n", sizeof(size_t));
int start = clock();
for(int64_t i = 0; i < 10000000000L; i++) {
}
printf("%ld\n", clock() - start);
}
int main() {
test();
return 0;
}
I'll also provide the assembly code for you to dig in to. I don't know assembly.
Slower:
.file "dummy.c"
.text
.def __main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.LC0:
.ascii "Size = %I64u\12\0"
.LC1:
.ascii "%ld\12\0"
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbx
.seh_pushreg %rbx
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
call __main
movl $8, %edx
leaq .LC0(%rip), %rcx
call printf
call clock
movl %eax, %ebx
movabsq $10000000000, %rax
.L2:
subq $1, %rax
jne .L2
call clock
subl %ebx, %eax
movl %eax, %edx
leaq .LC1(%rip), %rcx
call printf
movl $0, %eax
addq $32, %rsp
popq %rbx
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def printf; .scl 2; .type 32; .endef
.def clock; .scl 2; .type 32; .endef
Faster:
.file "dummy.c"
.text
.section .rdata,"dr"
.LC0:
.ascii "Size = %I64u\12\0"
.LC1:
.ascii "%ld\12\0"
.text
.globl test
.def test; .scl 2; .type 32; .endef
.seh_proc test
test:
pushq %rbx
.seh_pushreg %rbx
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
movl $8, %edx
leaq .LC0(%rip), %rcx
call printf
call clock
movl %eax, %ebx
movabsq $10000000000, %rax
.L2:
subq $1, %rax
jne .L2
call clock
subl %ebx, %eax
movl %eax, %edx
leaq .LC1(%rip), %rcx
call printf
nop
addq $32, %rsp
popq %rbx
ret
.seh_endproc
.def __main; .scl 2; .type 32; .endef
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
call test
movl $0, %eax
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0"
.def printf; .scl 2; .type 32; .endef
.def clock; .scl 2; .type 32; .endef
Here's my batch script for compilation:
#echo off
set /p file= File to compile:
del compiled.exe
gcc -Wall -Wextra -std=c17 -O1 -o compiled.exe %file%.c
compiled.exe
PAUSE
And for compilation to assembly:
#echo off
set /p file= File to compile:
del %file%.s
gcc -S -Wall -Wextra -std=c17 -O1 %file%.c
PAUSE

The slow version:
Note that the sub rax, 1 \ jne pair goes right across the boundary of the ..80 (which is a 32byte boundary). This is one of the cases mentioned in Intels document regarding this issue namely as this diagram:
So this op/branch pair is affected by the fix for the JCC erratum (which would cause it to not be cached in the µop cache). I'm not sure if that is the reason, there are other things at play too, but it's a thing.
In the fast version, the branch is not "touching" a 32byte boundary, so it is not affected.
There may be other effects that apply. Still due to crossing a 32byte boundary, in the slow case the loop is spread across 2 chunks in the µop cache, even without the fix for JCC erratum that may cause it to run at 2 cycles per iteration if the loop cannot execute from the Loop Stream Detector (which is disabled on some processors by an other fix for an other erratum, SKL150). See eg this answer about loop performance.
To address the various comments saying they cannot reproduce this, yes there are various ways that could happen:
Whichever effect was responsible for the slowdown, it is likely caused by the exact placement of the op/branch pair across a 32byte boundary, which happened by pure accident. Compiling from source is unlikely to reproduce the same circumstances, unless you use the same compiler with the same setup as was used by the original poster.
Even using the same binary, regardless of which of the effects is responsible, the weird effect would only happen on particular processors.

Why does a simple C "hello world" program not work if gcc -O3 without "volatile"?

I have the following C program
int main() {
char string[] = "Hello, world.\r\n";
__asm__ volatile ("syscall;" :: "a" (1), "D" (0), "S" ((unsigned long) string), "d" (sizeof(string) - 1)); }
which I want to run under Linux with with x86 64 bit. I call the syscall for "write" with 0 as fd argument because this is stdout.
If I compile under gcc with -O3, it does not work. A look into the assembly code
.file "test_for_o3.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
subq $40, %rsp
.cfi_def_cfa_offset 48
xorl %edi, %edi
movl $15, %edx
movq %fs:40, %rax
movq %rax, 24(%rsp)
xorl %eax, %eax
movq %rsp, %rsi
movl $1, %eax
#APP
# 5 "test_for_o3.c" 1
syscall;
# 0 "" 2
#NO_APP
movq 24(%rsp), %rcx
xorq %fs:40, %rcx
jne .L5
xorl %eax, %eax
addq $40, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
call __stack_chk_fail#PLT
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0"
.section .note.GNU-stack,"",#progbits
tells us that gcc has simply not put the string data into the assembly code. Instead, if I declare "string" as "volatile", it works fine.
However, the idea of "volatile" is just to use it for variables that can change their values by (from the view of the executing function) unexpected events, isn't it? "volatile" can make code much slower, hence it should be avoided if possible.
As I would suppose, gcc must assume that the content of "string" must not be ignored because the pointer "string" is used as an input parameter in the inline assembly (and gcc has no idea what the inline assembly code will do with it).
If this is "allowed" behaviour of gcc, where can I read more about all the formal constraints I have to be aware of when writing code for -O3?
A second question would be what the "volatile" statement along with the inline assembly directive does exactly. I just got used to mark all inline assembly directives with "volatile" because it had not worked otherwise, in some situations.

How do i get rid of call __x86.get_pc_thunk.ax

I tried to compile and convert a very simple C program to assembly language.
I am using Ubuntu and the OS type is 64 bit.
This is the C Program.
void add();
int main() {
add();
return 0;
}
if i use gcc -S -m32 -fno-asynchronous-unwind-tables -o simple.S simple.c this is how my assembly source code File should look like:
.file "main1.c"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call add
movl $0, %eax
movl %ebp, %esp
popl %ebp
ret
.size main, .-main
.ident "GCC: (Debian 4.4.5-8) 4.4.5" // this part should say Ubuntu instead of Debian
.section .note.GNU-stack,"",#progbits
but instead it looks like this:
.file "main0.c"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %ecx
call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
movl %eax, %ebx
call add#PLT
movl $0, %eax
popl %ecx
popl %ebx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.section
.text.__x86.get_pc_thunk.ax,"axG",#progbits,__x86.get_pc_thunk.ax,comdat
.globl __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type __x86.get_pc_thunk.ax, #function
__x86.get_pc_thunk.ax:
movl (%esp), %eax
ret
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
At my University they told me to use the Flag -m32 if I am using a 64 bit Linux version. Can somebody tell me what I am doing wrong?
Am I even using the correct Flag?
edit after -fno-pie
.file "main0.c"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
call add
movl $0, %eax
addl $4, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section .note.GNU-stack,"",#progbits
it looks better but it's not exactly the same.
for example what does leal mean?

As a general rule, you cannot expect two different compilers to generate the same assembly code for the same input, even if they have the same version number; they could have any number of extra "patches" to their code generation. As long as the observable behavior is the same, anything goes.
You should also know that GCC, in its default -O0 mode, generates intentionally bad code. It's tuned for ease of debugging and speed of compilation, not for either clarity or efficiency of the generated code. It is often easier to understand the code generated by gcc -O1 than the code generated by gcc -O0.
You should also know that the main function often needs to do extra setup and teardown that other functions do not need to do. The instruction leal 4(%esp),%ecx is part of that extra setup. If you only want to understand the machine code corresponding to the code you wrote, and not the nitty details of the ABI, name your test function something other than main.
(As pointed out in the comments, that setup code is not as tightly tuned as it could be, but it doesn't normally matter, because it's only executed once in the lifetime of the program.)
Now, to answer the question that was literally asked, the reason for the appearance of
call __x86.get_pc_thunk.ax
is because your compiler defaults to generating "position-independent" executables. Position-independent means the operating system can load the program's machine code at any address in (virtual) memory and it'll still work. This allows things like address space layout randomization, but to make it work, you have to take special steps to set up a "global pointer" at the beginning of every function that accesses global variables or calls another function (with some exceptions). It's actually easier to explain the code that's generated if you turn optimization on:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %ecx
This is all just setting up main's stack frame and saving registers that need to be saved. You can ignore it.
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
The special function __x86.get_pc_thunk.bx loads its return address -- which is the address of the addl instruction that immediately follows -- into the EBX register. Then we add to that address the value of the magic constant _GLOBAL_OFFSET_TABLE_, which, in position-independent code, is the difference between the address of the instruction that uses _GLOBAL_OFFSET_TABLE_ and the address of the global offset table. Thus, EBX now points to the global offset table.
call add#PLT
Now we call add#PLT, which means call add, but jump through the "procedure linkage table" to do it. The PLT takes care of the possibility that add is defined in a shared library rather than the main executable. The code in the PLT uses the global offset table and assumes that you have already set EBX to point to it, before calling an #PLT symbol.  That's why main has to set up EBX even though nothing appears to use it. If you had instead written something like
extern int number;
int main(void) { return number; }
then you would see a direct use of the GOT, something like
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
movl number#GOT(%ebx), %eax
movl (%eax), %eax
We load up EBX with the address of the GOT, then we can load the address of the global variable number from the GOT, and then we actually dereference the address to get the value of number.
If you compile 64-bit code instead, you'll see something different and much simpler:
movl number(%rip), %eax
Instead of all this mucking around with the GOT, we can just load number from a fixed offset from the program counter. PC-relative addressing was added along with the 64-bit extensions to the x86 architecture. Similarly, your original program, in 64-bit position-independent mode, will just say
call add#PLT
without setting up EBX first. The call still has to go through the PLT, but the PLT uses PC-relative addressing itself and doesn't need any help from its caller.
The only difference between __x86.get_pc_thunk.bx and __x86.get_pc_thunk.ax is which register they store their return address in: EBX for .bx, EAX for .ax. I have also seen GCC generate .cx and .dx variants. It's just a matter of which register it wants to use for the global pointer -- it must be EBX if there are going to be calls through the PLT, but if there aren't any then it can use any register, so it tries to pick one that isn't needed for anything else.
Why does it call a function to get the return address? Older compilers would do this instead:
call 1f
1: pop %ebx
but that screws up return-address prediction, so nowadays the compiler goes to a little extra trouble to make sure every call is paired with a ret.

The extra junk you're seeing is due to your version of GCC special-casing main to compensate for possibly-broken entry point code starting it with a misaligned stack. I'm not sure how to disable this or if it's even possible, but renaming the function to something other than main will suppress it for the sake of your reading.
After renaming to xmain I get:
xmain:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
call add
movl $0, %eax
leave
ret

Stack View when printf is called?

I was just learning about format string vulnerabilities that makes me ask this question
Consider the following simple program:
#include<stdio.h>
void main(int argc, char **argv)
{
char *s="SomeString";
printf(argv[1]);
}
Now clearly, this code is vulnerable to a format String vulnerability. I.e. when the command line argument is %s, then the value SomeString is printed since printf pops the stack once.
What I dont understand is the structure of the stack when printf is called
In my head I imagine the stack to be as follows:
grows from left to right ----->
main() ---> printf()-->
RET to libc_main | address of 's' | current registers| ret ptr to main | ptr to format string|
if this is the case, how does inputting %s to the program, cause the value of s to be popped ?
(OR) If I am totally wrong about the stack structure , please correct me

The stack contents depends a lot on the following:
the CPU
the compiler
the calling conventions (i.e. how parameters are passed in the registers and on the stack)
the code optimizations performed by the compiler
This is what I get by compiling your tiny program with x86 mingw using gcc stk.c -S -o stk.s:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "SomeString\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
call ___main
movl $LC0, 28(%esp)
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _printf; .scl 2; .type 32; .endef
And this is what I get using gcc stk.c -S -O2 -o stk.s, that is, with optimizations enabled:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB7:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl 12(%ebp), %eax
movl 4(%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE7:
.def _printf; .scl 2; .type 32; .endef
As you can see, in the latter case there's no pointer to "SomeString" on the stack. In fact, the string isn't even present in the compiled code.
In this simple code there are no registers saved on the stack because there aren't any variables allocated to registers that need to be preserved across the call to printf().
So, the only things you get on the stack here are the string pointer (optionally), unused space due to stack alignment (andl $-16, %esp + subl $32, %esp align the stack and allocate space for local variables, none here), the printf()'s parameter, the return address for returning from printf() back to main().
In the former case the pointer to "SomeString" and the printf()'s parameter (value of argv[1]) are quite far away from one another:
movl $LC0, 28(%esp) ; address of "SomeString" is at esp+28
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp) ; address of a copy of argv[1] is at esp
call _printf
To make the two addresses stored one right after the other on the stack, if that's what you want, you'd need to play with the code, compilation/optimization options or use a different compiler.
Or you could supply a format string in argv[1] such that printf() would reach it. You could, for example, include a number of fake parameters in the format string.
For example, if I compile this piece of code using gcc stk.c -o stk.exe and run it as stk.exe %u%u%u%u%u%u%s, I'll get the following output from it:
4200532268676042006264200532880015253SomeString
All of this is pretty hacky and it's not trivial to make it work right.

On x86 the stack on a function call could look something like:
: :
+--------------+
: alignment :
+--------------+
12(%ebp) | arg2 |
+--------------+
8(%ebp) | arg1 |
+--------------+
4(%ebp) | ret | -----> return address
+--------------+
(%ebp) | ebp | -----> previous ebp value
+--------------+
-4(%ebp) | local1 | -----> local vars, sometimes they can overflow ;-)
+--------------+
: alignment :
+--------------+
: :
If you used -fomit-frame-pointer ebp would not be saved on the stack. At different optimization levels some variables may disappear (be optimized out), ...
Other ABIs store function arguments on registers, instead of saving them on the stack. Later, before calling another function, live registers may spill into the stack.

Creating a directory in linux assembly language

I am trying to create a small assembly program to create a folder. I looked up the system call for creating a directory on this page. It says that it is identified by 27h. How would I go about implementing the mkdir somename in assembly?
I am aware that the program should move 27 into eax but I am unsure where to go next. I have googled quite a bit and no one seems to have posted anthing about this online.
This is my current code (I don't know in which register to put filename and so on):
section .data
section .text
global _start
mov eax, 27
mov ????????
....
int 80h
Thanks

One way of finding out, is using GCC to translate the following C code:
#include <stdio.h>
#include <sys/stat.h>
int main()
{
if (mkdir("testdir", 0777) != 0)
{
return -1;
}
return 0;
}
to assembly, with: gcc mkdir.c -S
.file "mkdir.c"
.section .rodata
.LC0:
.string "testdir"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
movl $511, 4(%esp)
movl $.LC0, (%esp)
call mkdir ; interesting call
testl %eax, %eax
setne %al
testb %al, %al
je .L2
movl $-1, %eax
jmp .L3
.L2:
movl $0, %eax
.L3:
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.5.1 20100924 (Red Hat 4.5.1-4)"
.section .note.GNU-stack,"",#progbits
Anyway, ProgrammingGroundUp page 272 lists important syscalls, including mkdir:
%eax Name %ebx %ecx %edx Notes
------------------------------------------------------------------
39 mkdir NULL terminated Permission Creates the given
directory name directory. Assumes all
directories leading up
to it already exist.

You could also do like the Assembly Howto is suggesting. But indeed, calling mkdir from Libc is more portable. You need to look into asm/unistd.h to get the syscall number.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight