Does GCC optimize assembly source file?

Does GCC optimize assembly source file? - c

I can use GCC to convert assembly code files into reallocatable files.
gcc -c source.S -o object.o -O2
Is the optimization option effective? Can I expect GCC to optimize my assembly code?

No.
GCC passes your assembly source through the preprocessor and then to the assembler. At no time are any optimisations performed.

If you don't want to hand-optimize your asm, assembly language is the wrong choice of source language for you. Perhaps consider LLVM-IR if you want something asm-like but which is actually input for an optimizing compiler. (And ISA-independent.)
To be fair, there are some binary-to-binary recompilers / optimizers that try to figure out what's implementation detail and what's important logic, and optimize accordingly. (Reading from asm source instead of machine code would also be possible; asm and machine code are easy to convert back and forth and have a nearly 1:1 mapping). But that's not what assemblers do.
An assembler's job is normally just to faithfully translate what you write into asm. Having a tool to do that is necessary for experimenting to find out what actually is faster, without the annoyance of writing actual machine code by hand.
Interestingly GAS, the GNU assembler does have some limited optimization options for x86 that aren't enabled by the GCC front-end, even if your run gcc -O2. (You can run gcc -v ... to see how the front-end invokes other programs to do the real work, with what options.)
Use gcc -Wa,-Os -O3 foo.c bar.S to enable full optimization of your C, and GAS's minor peephole optimizations for your asm. (Or -Wa,-O2, unfortunately the manual is wrong and -Os misses some of the optimizations from -O2) -Wa,... passes ... on the as command line, just like -Wl,... passes linker options through the GCC front-end.
GCC doesn't normally enable as's optimizations because it normally feeds GAS already-optimized asm.
GAS's optimizations are only for single instructions in isolation, and thus only when an instruction can be replaced by another that has exactly the same architectural effect (except for length, so the effect on RIP differs). The micro-architectural effect (performance) can also be different; that's the point of the non-size optimizations.
From the as(1) man page, so note that these are as options, not gcc options.
-O0 | -O | -O1 | -O2 | -Os
Optimize instruction encoding with smaller instruction size. -O
and -O1 encode 64-bit register load instructions with 64-bit
immediate as 32-bit register load instructions with 31-bit or
32-bits immediates, encode 64-bit register clearing instructions
with 32-bit register clearing instructions, encode 256-bit/512-bit
VEX/EVEX vector register clearing instructions with 128-bit VEX
vector register clearing instructions, encode 128-bit/256-bit EVEX
vector register load/store instructions with VEX vector register
load/store instructions, and encode 128-bit/256-bit EVEX packed
integer logical instructions with 128-bit/256-bit VEX packed
integer logical.
-O2 includes -O1 optimization plus encodes 256-bit/512-bit EVEX
vector register clearing instructions with 128-bit EVEX vector
register clearing instructions. In 64-bit mode VEX encoded
instructions with commutative source operands will also have their
source operands swapped if this allows using the 2-byte VEX prefix
form instead of the 3-byte one. Certain forms of AND as well as OR
with the same (register) operand specified twice will also be
changed to TEST.
-Os includes -O2 optimization plus encodes 16-bit, 32-bit and
64-bit register tests with immediate as 8-bit register test with
immediate. -O0 turns off this optimization.
(re: some of those VEX / EVEX operand-size and code-size optimizations: Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm? and the section near the end of my answer on How to tell the length of an x86 instruction? re: 2 vs. 3-byte VEX prefixes)
Unfortunately -O2 and -Os conflict and -Os doesn't actually include everything from -O2. You can't get it to optimize test [re]dx, 1 to test dl,1 (-Os) and optimize or al,al to test al,al (-O2).
But it's still more optimization than NASM does. (NASM's optimization is on by default, except in ancient versions; GAS's is off by default except for picking the shortest encoding without changing the mnemonic or operand names.)
test r/m32, imm8 is not encodeable so the edx version needs an imm32.
or al,al is an obsolete 8080 idiom that's not useful for x86, except sometimes on P6-family to avoid register-read stalls where intentionally re-writing the register is actually better than avoiding lengthening the dep chain.
.intel_syntax noprefix
shufps xmm0, xmm0, 0
vxorps zmm31, zmm31, zmm31
vxorps zmm1, zmm1, zmm1
vxorps ymm15, ymm15, ymm15
vpxord zmm15, zmm15, zmm15
vpxord ymm3, ymm14, ymm15
vpxord ymm3, ymm4, ymm15
vmovd xmm16, [rdi + 256] # can use EVEX scaled disp8
vmovd xmm0, [rdi + 256] # could use EVEX scaled disp8 but doesn't even with a -march enabling AVX512
xor rax, rax
or al,al
cmp dl, 0
test rdx, 1
mov rax, 1
mov rax, -1
mov rax, 0xffffffff80000000
.att_syntax
movabs $-1, %rax
movq $1, %rax
movabs $1, %rax
Assembled with gcc -g -Wa,-msse2avx -Wa,-O2 -Wa,-march=znver2+avx512dq+avx512vl -c foo.s (For some insane reason, as has -march= support for modern AMD CPU names, but for Intel only up to corei7 and some Xeon Phi, not Skylake-avx512 like GCC does. So I had to enable AVX512 manually to test that.
objdump -dwrC -Mintel -S source + disassembly
0000000000000000 <.text>:
.intel_syntax noprefix
shufps xmm0, xmm0, 0 # -msse2avx just for fun
0: c5 f8 c6 c0 00 vshufps xmm0,xmm0,xmm0,0x0
vxorps zmm31, zmm31, zmm31 # avoids triggering AVX512 frequency limit
5: 62 01 04 00 57 ff vxorps xmm31,xmm31,xmm31
vxorps zmm1, zmm1, zmm1 # shorter, using VEX
b: c5 f0 57 c9 vxorps xmm1,xmm1,xmm1
vxorps ymm15, ymm15, ymm15 # missed optimization, could vxorps xmm15, xmm0, xmm0 for a 2-byte VEX and still be a zeroing idiom
f: c4 41 00 57 ff vxorps xmm15,xmm15,xmm15
vpxord zmm15, zmm15, zmm15 # AVX512 mnemonic optimized to AVX1, same missed opt for source operands.
14: c4 41 01 ef ff vpxor xmm15,xmm15,xmm15
vpxord ymm3, ymm14, ymm15 # no optimization possible
19: c4 c1 0d ef df vpxor ymm3,ymm14,ymm15
vpxord ymm3, ymm4, ymm15 # reversed operands to allow 2-byte VEX
1e: c5 85 ef dc vpxor ymm3,ymm15,ymm4
vmovd xmm16, [rdi + 256] # uses EVEX scaled disp8 because xmm16 requires EVEX anyway
22: 62 e1 7d 08 6e 47 40 vmovd xmm16,DWORD PTR [rdi+0x100]
vmovd xmm0, [rdi + 256] # could use EVEX scaled disp8 but doesn't even with a -march enabling AVX512
29: c5 f9 6e 87 00 01 00 00 vmovd xmm0,DWORD PTR [rdi+0x100]
xor rax, rax # dropped REX prefix
31: 31 c0 xor eax,eax
or al,al
33: 84 c0 test al,al
cmp dl, 0 # optimization to test dl,dl not quite legal: different effect on AF
35: 80 fa 00 cmp dl,0x0
test rdx, 1 # partial optimization: only to 32-bit, not 8-bit
38: f7 c2 01 00 00 00 test edx,0x1
mov rax, 1
3e: b8 01 00 00 00 mov eax,0x1
mov rax, -1 # sign-extension required
43: 48 c7 c0 ff ff ff ff mov rax,0xffffffffffffffff
mov rax, 0xffffffff80000000
4a: 48 c7 c0 00 00 00 80 mov rax,0xffffffff80000000
.att_syntax
movabs $-1, %rax # movabs forces imm64, despite -O2
51: 48 b8 ff ff ff ff ff ff ff ff movabs rax,0xffffffffffffffff
movq $1, %rax # but explicit q operand size doesn't stop opt
5b: b8 01 00 00 00 mov eax,0x1
movabs $1, %rax
60: 48 b8 01 00 00 00 00 00 00 00 movabs rax,0x1
So unfortunately even explicitly enabling AVX512VL and AVX512DQ didn't get GAS to choose a shorter EVEX encoding for vmovd when an EVEX wasn't already necessary. That's perhaps still intentional: you might want some functions to use AVX512, some to avoid it. If you're using ISA-option limits to catch accidental use of ISA extensions, you would have to enable AVX512 for the whole of such a file. It might be surprising to find the assembler using EVEX where you weren't expecting.
You can manually force it with {evex} vmovd xmm0, [rdi + 256]. (Which unfortunately GCC doesn't do when compiling C, where -march=skylake-avx512 definitely does give it free reign to use AVX512 instructions everywhere.)

so.s
#define HELLO 0x5
mov $HELLO, %eax
mov $0x5,%eax
mov $0x5,%eax
mov $0x5,%eax
retq
gcc -O2 -c so.s -o so.o
objdump -d so.o
0000000000000000 <.text>:
0: b8 00 00 00 00 mov $0x0,%eax
5: b8 05 00 00 00 mov $0x5,%eax
a: b8 05 00 00 00 mov $0x5,%eax
f: b8 05 00 00 00 mov $0x5,%eax
14: c3 retq
It didnt even pre-process the define.
rename so.s to so.S
gcc -O2 -c so.S -o so.o
objdump -d so.o
0000000000000000 <.text>:
0: b8 05 00 00 00 mov $0x5,%eax
5: b8 05 00 00 00 mov $0x5,%eax
a: b8 05 00 00 00 mov $0x5,%eax
f: b8 05 00 00 00 mov $0x5,%eax
14: c3 retq
It pre-processes the define but no optimization is occurring.
Looking slightly deeper and what is being passed to as
gcc -O2 -c -save-temps so.s -o so.o
[0][as]
[1][--64]
[2][-o]
[3][so.o]
[4][so.s]
cat so.s
#define HELLO 0x5
mov $HELLO, %eax
mov $0x5,%eax
mov $0x5,%eax
mov $0x5,%eax
retq
And
gcc -O2 -c -save-temps so.S -o so.o
[0][as]
[1][--64]
[2][-o]
[3][so.o]
[4][so.s]
cat so.s
# 1 "so.S"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "so.S"
mov $0x5, %eax
mov $0x5,%eax
mov $0x5,%eax
mov $0x5,%eax
retq
still no optimization.
Should be more than enough to demonstrate. There are link time optimizations that you can do you have to build the objects right and then tell the linker. But I suspect it doesn't do it at a machine code level but a high level and re-generates code.
int main ( void )
{
return(5);
}
gcc -O2 so.c -save-temps -o so.o
cat so.s
.file "so.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
movl $5, %eax
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
Using the so.S from above
gcc -flto -O2 so.S -save-temps -o so.o
cat so.s
# 1 "so.S"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "so.S"
mov $0x5, %eax
mov $0x5,%eax
mov $0x5,%eax
mov $0x5,%eax
retq
Using the so.c from above
gcc -flto -O2 so.c -save-temps -o so.o
cat so.s
.file "so.c"
.section .gnu.lto_.profile.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`d`a`"
.string "\222L\214"
.string ""
.string "o"
.ascii "\016"
.text
.section .gnu.lto_.icf.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`d"
.string "\001\016\006\004`d\330|\356\347Nv\006"
.ascii "\017\243\003I"
.text
.section .gnu.lto_.jmpfuncs.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`d"
.string "\001V\006\004"
.string "\213"
.string ""
.string ""
.string "\356"
.ascii "\f"
.text
.section .gnu.lto_.inline.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`d"
.string "\001\021\006\004"
.string "\21203120\001\231l\013\344\231\300b"
.string "\n\031"
.ascii "\352"
.text
.section .gnu.lto_.pureconst.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`d`f`"
.string "\222\f"
.string ""
.string "X"
.ascii "\n"
.text
.section .gnu.lto_main.3f5dbe2a70110b8,"e",#progbits
.ascii "x\234\035\216\273\016\001a\020\205\347\314\277\313\026\210\236"
.ascii "B\253\3610^\301\003(<\300\376\330B\024\262\005\211\210r\223-"
.ascii "\334[\3256\n\005\2117\020\n\211NH(\0043&9\2319\231o.\016\201"
.ascii "4f\242\264\250 \202!p\270'jz\fha=\220\317\360\361bkp\b\226c\363"
.ascii "\344\216`\216\330\333nt\316\251\005Jb/Qo\210rl%\216\233\276\327"
.ascii "\r\3211L-\201\247(b\202\242^\230\241L\302\236V\237A6\025([RD"
.ascii ":s\244\364\243E5\261\337o\333&q\336e\242\273H\037y0k6W\264\362"
.ascii "\272`\033\255\337\031\275\315p\261\370\357\026\026\312\310\204"
.ascii "\333\250Wj\364\003\t\210<\r"
.text
.section .gnu.lto_.symbol_nodes.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`d\020f"
.string "\002&\206z\006\206\t\347\030#\324\256\206#\240\b"
.ascii "'\370\004\002"
.text
.section .gnu.lto_.refs.3f5dbe2a70110b8,"e",#progbits
.string "x\234ca`\004B "
.string ""
.string ""
.string "9"
.ascii "\007"
.text
.section .gnu.lto_.decls.3f5dbe2a70110b8,"e",#progbits
.string "x\234\205PMK\002Q\024\275\347\315h\222\021R-\\\270\020\027\355\222\244\020\367A\355b6A\264\013\261p\221AmZ^\377\200DB\340N\004)\320j~A\bA\021\371\007J!\241e\277#\b\354\276y3\216\320\242\013\367\343\335w\3369\367]\233#\332\372\222V%\357\213O\304\224\344\003\nM\243\\\372k\272g\211/\211\257\210;\377\340\331\302w{\370\025\031\340\035\242\201D\202\022\004xC\350\344\225\306\275\243\024\312\213\024\266\020"
.ascii "\375\263\nJ_\332\300u\317\344I`\001\211O\345\253i\006\302tB\363"
.ascii "\b\360X\303\247Se\005\337h\226\330\260\316\360\032q\177\023A"
.ascii "\224\337\337<\266\027\207\370\2502s\223\331\301T\322[#Q\224\331"
.ascii "\326\373\204\2058\321\302S\203\235+\301\266\270\247\367%\004"
.ascii "\215\376[\335\262\226\241\353\317\361\355v\266+\327|\311\254"
.ascii "\n\341\216;?\265\227x\362Z\337\214\252\234\006\234yl\244\260"
.ascii "\236\022\261\007$%\036\331\0069~\346V4\323d\327\345Q\375U\325"
.ascii "\270\247GS\032\205;\031\342\036Y=\241\224\022\273\030\002\035"
.ascii "\fd`\027\031\232\273(\344\327\362\233\024;.UJg\345\"\331'\207"
.ascii "\345Jlgw/\275\225\313Q\344\3744[\244_\320\267k~"
.text
.section .gnu.lto_.symtab.3f5dbe2a70110b8,"e",#progbits
.string "main"
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string "\260"
.string ""
.string ""
.text
.section .gnu.lto_.opts,"e",#progbits
.string "'-fmath-errno' '-fsigned-zeros' '-ftrapping-math' '-fno-trapv' '-fno-openmp' '-fno-openacc' '-mtune=generic' '-march=x86-64' '-O2' '-flto' '-fstack-protector-strong'"
.text
.comm __gnu_lto_v1,1,1
.comm __gnu_lto_slim,1,1
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
So it still does not appear that gcc is doing any optimization removing these duplicate instructions that have no functional advantage and are basically dead code. It does show that gcc will pre-process the code if the file has the .S but not if .s (can experiment or read the docs on others .asm?). These were run on linux, gcc is gcc, binutils is binutils, the specific file names extension sensitivity may vary by target host.
The link time optimization appears to be related to the high level code as one would expect not the assembly language code. One expects the link time optimization to be based on the middle end code not back end.
We know that gcc is not an assembler it just passes it on even if it is generated from C it passes it on so it would need an assembler parser and then logic to deal with that language to then pick out things to pass on for link time optimization.
You can read more on link time optimization and see if there is a way to apply it to the assembler... I would assume not but your entire question is about how to use the tools and how they work.
Assembly language optimization isn't necessarily a thing, that is kind of the point, now there are pseudo code things for pseudo instructions that the assembler may choose an optimized implementation
ldr r0,=0x12345678
ldr r0,=0x1000
ldr r0,=0xFFFFFF12
00000000 <.text>:
0: e59f0004 ldr r0, [pc, #4] ; c <.text+0xc>
4: e3a00a01 mov r0, #4096 ; 0x1000
8: e3e000ed mvn r0, #237 ; 0xed
c: 12345678 .word 0x12345678
But that is pseudo code so the assembler that supports it is free to do whatever they want. (assemblers define the assembly language (not the target) so by definition they get to do whatever they want). On that note using a compiler as an assembler when the toolchain also has an assembler changes it into yet another assembly language as assembly language is defined by the tool. So when you allow gcc to pre-process the code you are basically using a different assembly language from as. Just like inline assembly for the compiler is yet another assembly language. At least three assembly languages per target for the gnu toolchain.

Related

How to define a subroutine/macro in sperate .asm file?

I'm programming in AVR for Atmega32 using AtmelStudio. I was wondering how you can write a subroutine or function in a separate file and then call it within main.asm?
current issue is I have a subroutine genArrays inside of genArray.asm .
Using .include "genArray.asm" at the start of main.asm causes the program to run genArray immediately at the start of main.asm when i don’t want it to be called until i actually call it using the call instruction (example below)
main.asm:
.include "genArray.asm"
.org 0x0000
start:
...
... ---- ; do some stuff
...
...
call genArrays ---- ; call to genArrays subroutine that is defined in genArray.asm (separate file)
... ---- ; return here and continue with program
... ---- ; do some more stuff
genArray.asm:
genArrays: ---- ; start of subroutine
...
...
...
... -----; do some stuff
...
ret

With gnu assembler/binutils. I am very rusty on my avr, but...
so.s
.globl _start
_start:
nop
nop
rcall fun
nop
nop
here:
rjmp here
fun.s
.globl fun
fun:
nop
nop
ret
build
avr-as so.s -o so.o
avr-as fun.s -o fun.o
avr-ld -Ttext=0 so.o fun.o -o so.elf
avr-objdump -d so.elf
so.elf: file format elf32-avr
Disassembly of section .text:
00000000 <__ctors_end>:
0: 00 00 nop
2: 00 00 nop
4: 03 d0 rcall .+6 ; 0xc <fun>
6: 00 00 nop
...
0000000a <here>:
a: ff cf rjmp .-2 ; 0xa <here>
0000000c <fun>:
c: 00 00 nop
e: 00 00 nop
10: 08 95 ret
assembly language is specific to the assembler, not the target, so you need to use the language for the assembler you are using, the above is gnu assembler for avr. Likewise how you link them is very much tool specific. gnu ld has a myriad of features, AVR is a PITA to build for (from scratch) so you may want to use an already built toolchain and linker script. (on linux I simply apt-got a toolchain).
As utterly horrible as this is:
avr-gcc -nostdlib so.s fun.s -o so.elf
avr-objdump -d so.elf
so.elf: file format elf32-avr
Disassembly of section .text:
00000000 <__ctors_end>:
0: 00 00 nop
2: 00 00 nop
4: 03 d0 rcall .+6 ; 0xc <fun>
6: 00 00 nop
...
0000000a <here>:
a: ff cf rjmp .-2 ; 0xa <here>
0000000c <fun>:
c: 00 00 nop
e: 00 00 nop
10: 08 95 ret
works (thus far).

Try the simple thing: put .include "genArray.asm" at the end of main.asm so its code will come after main in program memory.
That should be good enough for now. You could also take a look at the assembly generated by avr-gcc and see how it defines its functions.

How I can add assembly code in C project?

I would like to compile a simple C project that has some externals functions defined in a ASM file. My main file is a C++ that calls some "extern "C"" functions that are defined in a assembly file.
When I run task "g++ build active file", I receive some warnings about the "extern" and some errors about functions defined in asm file telling "reference to my_funcions not defined".
My C++ file contains a "extern" like this:
[...]
extern "C" {
// Subrutines en ASM
void posCurScreenP1();
void moveCursorP1();
void openP1();
void getMoveP1();
void movContinuoP1();
void openContinuousP1();
void printChar_C(char c);
int clearscreen_C();
int printMenu_C();
int gotoxy_C(int row_num, int col_num);
char getch_C();
int printBoard_C(int tries);
void continue_C();
}
[...]
and my asm file contains this:
.586
.MODEL FLAT, C
; Funcions definides en C
printChar_C PROTO C, value:SDWORD
printInt_C PROTO C, value:SDWORD
clearscreen_C PROTO C
clearArea_C PROTO C, value:SDWORD, value1: SDWORD
printMenu_C PROTO C
gotoxy_C PROTO C, value:SDWORD, value1: SDWORD
getch_C PROTO C
printBoard_C PROTO C, value: DWORD
initialPosition_C PROTO C
.code
[...]
Sure I'm doing some things wrong. Could you help me?
Thanks.

hmm
so.c
extern "C" int fun ( void );
int x;
int main()
{
x=fun();
return x;
}
fun.c
int fun ( void )
{
return(5);
}
build
gcc fun.c -O2 -c -o fun.o
g++ -O2 so.cpp fun.o -o so
No errors
00000000004003e0 <main>:
4003e0: 48 83 ec 08 sub $0x8,%rsp
4003e4: e8 17 01 00 00 callq 400500 <fun>
4003e9: 89 05 45 0c 20 00 mov %eax,0x200c45(%rip) # 601034 <x>
4003ef: 48 83 c4 08 add $0x8,%rsp
4003f3: c3 retq
0000000000400500 <fun>:
400500: b8 05 00 00 00 mov $0x5,%eax
400505: c3 retq
okay so
morefun.s
.globl fun
fun:
mov $0x5,%eax
retq
build
as morefun.s -o morefun.o
g++ -O2 so.cpp morefun.o -o so
no errors,
examine
00000000004003e0 <main>:
4003e0: 48 83 ec 08 sub $0x8,%rsp
4003e4: e8 0d 01 00 00 callq 4004f6 <fun>
4003e9: 89 05 45 0c 20 00 mov %eax,0x200c45(%rip) # 601034 <x>
4003ef: 48 83 c4 08 add $0x8,%rsp
4003f3: c3 retq
0000000004004f6 <fun>:
4004f6: b8 05 00 00 00 mov $0x5,%eax
4004fb: c3 retq
Still looks good, no problem adding assembly to a C++ project by making it look like a C function.
Other path
int fun ( void )
{
return(5);
}
gnu and most other sane compilers compile to asm then call the assembler so you can just see how they do it and repeat that
gcc -O2 -S fun.c -o fun.s
as fun.s -o fun.o
cat fun.s
.file "fun.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl fun
.type fun, #function
fun:
.LFB0:
.cfi_startproc
movl $5, %eax
ret
.cfi_endproc
.LFE0:
.size fun, .-fun
.section .text.unlikely
.LCOLDE0:
.text
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
or use save-temps.
gcc -O2 -c -save-temps fun.c -o fun.o
to see the asm generated by the compiler
It is generally more painful to try to use compiled assembly as a starting point as is, cutting and pasting sure but there is a lot of overhead and the machine generated labels that you would want to clean up rather than start from scratch. (I prefer to disassemble and work from that than to use the compiler output directly)

Is there a GCC version of the NASM ORG instruction?

I'm currently making an OS, and when I tried to add C support, I ran into a bit of a problem... In assembly, each program on my OS starts with ORG 32768 (the NASM compiler preprocessor instruction for offsetting the origin of the code), but I can't seem to find anything on a way to do this using the GCC compiler for C. So, my question is, how would one achieve this (offsetting the code's origin) in C using GCC? (and yes, I have looked it up before asking, even checked GNU's official GCC's C preprocessor documentation)

ORG and .ORG go back to the days when you wrote programs in assembly and didnt necessarily need a linker.
The gnu tools dont support it AFAIK.
start.s
.globl _start
_start:
mov $0xA000,%rsp
callq fun
jmp .
fun.c
unsigned int fun ( void )
{
return(7);
}
fun.ld
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 0x2000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.data : { *(.data*) } > ram
.bss : { *(.bss*) } > ram
}
build commands
as start.s -o start.o
gcc -O2 -nostdlib -nostartfiles -ffreestanding -c fun.c -o fun.o
ld -T fun.ld start.o fun.o -o fun
which produces this program:
0000000000008000 <_start>:
8000: 48 c7 c4 00 a0 00 00 mov $0xa000,%rsp
8007: e8 04 00 00 00 callq 8010 <fun>
800c: eb fe jmp 800c <_start+0xc>
800e: 66 90 xchg %ax,%ax
0000000000008010 <fun>:
8010: b8 07 00 00 00 mov $0x7,%eax
8015: c3 retq
I used an entry point of 0x8000 (32768).
If by gcc you meant the gnu tools and just wanted to do assembly language then that makes it a bit simpler, you only need the binutils package not gcc. But you still need the linker and use the ORIGIN in the very simpler linker script example above where you would have used .ORG inline with the assembly.
start.s
.globl _start
_start:
mov $0xA000,%rsp
mov $0x7,%eax
add $0x1,%eax
jmp .
same linker script as above
as start.s -o start.o
ld -T fun.ld start.o -o fun
producing
0000000000008000 <_start>:
8000: 48 c7 c4 00 a0 00 00 mov $0xa000,%rsp
8007: b8 07 00 00 00 mov $0x7,%eax
800c: 83 c0 01 add $0x1,%eax
800f: eb fe jmp 800f <_start+0xf>

Why does GCC produce stack preservation instructions when they're not necessary?

I'm compiling the following simple demonstration function:
int add(int a, int b) {
return a + b;
}
Naturally this function would be inlined, but let's assume that it's dynamically linked or not inlined for some other reason. With optimization disabled, the compiler produces the expected code:
00000000 <add>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 8b 45 0c mov eax,DWORD PTR [ebp+0xc]
6: 03 45 08 add eax,DWORD PTR [ebp+0x8]
9: 5d pop ebp
a: c3 ret
Since there are no function calls inside this function, the instructions at 0, 1 and 9 seemingly have no purpose. Since optimization is disabled, this is acceptable.
However, when compiling while optimizing for size with -Os -s, the exact same code is produced. It seems rather wasteful to increase the size of the function by 66% with these options.
Why is the code not optimized to the following?
00000000 <add>:
0: 8b 45 0c mov eax,DWORD PTR [esp+0x8]
3: 03 45 08 add eax,DWORD PTR [esp+0x4]
6: c3 ret
Does the compiler just not consider this worth optimizing or is it related to other details like function alignment?

This is done to preserve the ability of the debugger to step through your code.
If you really want to disable this try -fomit-frame-pointer.
Compiling your above code using -Os -fomit-frame-pointer -S -masm=intel gave this:
.file "frame.c"
.intel_syntax noprefix
.text
.globl _add
.def _add; .scl 2; .type 32; .endef
_add:
mov eax, DWORD PTR [esp+8]
add eax, DWORD PTR [esp+4]
ret
.ident "GCC: (rev0, Built by MinGW-builds project) 4.8.0"

The value of EBP is not known when the function enters. Code could use mov eax,dword ptr [esp+8] and not bother with the BP register, but many debugging tools assume that each local variable is at a fixed offset relative to some register. Even if a compiler could keep track of things that were pushed on the stack and adjust indexing offsets appropriately, debuggers would likely be unable to do so.

Using GCC to produce readable assembly?

I was wondering how to use GCC on my C source file to dump a mnemonic version of the machine code so I could see what my code was being compiled into. You can do this with Java but I haven't been able to find a way with GCC.
I am trying to re-write a C method in assembly and seeing how GCC does it would be a big help.

If you compile with debug symbols (add -g to your GCC command line, even if you're also using -O31),
you can use objdump -S to produce a more readable disassembly interleaved with C source.
>objdump --help
[...]
-S, --source Intermix source code with disassembly
-l, --line-numbers Include line numbers and filenames in output
objdump -drwC -Mintel is nice:
-r shows symbol names on relocations (so you'd see puts in the call instruction below)
-R shows dynamic-linking relocations / symbol names (useful on shared libraries)
-C demangles C++ symbol names
-w is "wide" mode: it doesn't line-wrap the machine-code bytes
-Mintel: use GAS/binutils MASM-like .intel_syntax noprefix syntax instead of AT&T
-S: interleave source lines with disassembly.
You could put something like alias disas="objdump -drwCS -Mintel" in your ~/.bashrc. If not on x86, or if you like AT&T syntax, omit -Mintel.
Example:
> gcc -g -c test.c
> objdump -d -M intel -S test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main(void)
{
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: 83 ec 10 sub esp,0x10
puts("test");
9: c7 04 24 00 00 00 00 mov DWORD PTR [esp],0x0
10: e8 fc ff ff ff call 11 <main+0x11>
return 0;
15: b8 00 00 00 00 mov eax,0x0
}
1a: c9 leave
1b: c3 ret
Note that this isn't using -r so the call rel32=-4 isn't annotated with the puts symbol name. And looks like a broken call that jumps into the middle of the call instruction in main. Remember that the rel32 displacement in the call encoding is just a placeholder until the linker fills in a real offset (to a PLT stub in this case, unless you statically link libc).
Footnote 1: Interleaving source can be messy and not very helpful in optimized builds; for that, consider https://godbolt.org/ or other ways of visualizing which instructions go with which source lines. In optimized code there's not always a single source line that accounts for an instruction but the debug info will pick one source line for each asm instruction.

If you give GCC the flag -fverbose-asm, it will
Put extra commentary information in the generated assembly code to make it more readable.
[...] The added comments include:
information on the compiler version and command-line options,
the source code lines associated with the assembly instructions, in the form FILENAME:LINENUMBER:CONTENT OF LINE,
hints on which high-level expressions correspond to the various assembly instruction operands.

Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S foo.c
will leave the generated assembly code on the file foo.s.
Ripped straight from http://www.delorie.com/djgpp/v2faq/faq8_20.html (but removing erroneous -c)

Using the -S switch to GCC on x86 based systems produces a dump of AT&T syntax, by default, which can be specified with the -masm=att switch, like so:
gcc -S -masm=att code.c
Whereas if you'd like to produce a dump in Intel syntax, you could use the -masm=intel switch, like so:
gcc -S -masm=intel code.c
(Both produce dumps of code.c into their various syntax, into the file code.s respectively)
In order to produce similar effects with objdump, you'd want to use the --disassembler-options= intel/att switch, an example (with code dumps to illustrate the differences in syntax):
$ objdump -d --disassembler-options=att code.c
080483c4 <main>:
80483c4: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483c8: 83 e4 f0 and $0xfffffff0,%esp
80483cb: ff 71 fc pushl -0x4(%ecx)
80483ce: 55 push %ebp
80483cf: 89 e5 mov %esp,%ebp
80483d1: 51 push %ecx
80483d2: 83 ec 04 sub $0x4,%esp
80483d5: c7 04 24 b0 84 04 08 movl $0x80484b0,(%esp)
80483dc: e8 13 ff ff ff call 80482f4 <puts#plt>
80483e1: b8 00 00 00 00 mov $0x0,%eax
80483e6: 83 c4 04 add $0x4,%esp
80483e9: 59 pop %ecx
80483ea: 5d pop %ebp
80483eb: 8d 61 fc lea -0x4(%ecx),%esp
80483ee: c3 ret
80483ef: 90 nop
and
$ objdump -d --disassembler-options=intel code.c
080483c4 <main>:
80483c4: 8d 4c 24 04 lea ecx,[esp+0x4]
80483c8: 83 e4 f0 and esp,0xfffffff0
80483cb: ff 71 fc push DWORD PTR [ecx-0x4]
80483ce: 55 push ebp
80483cf: 89 e5 mov ebp,esp
80483d1: 51 push ecx
80483d2: 83 ec 04 sub esp,0x4
80483d5: c7 04 24 b0 84 04 08 mov DWORD PTR [esp],0x80484b0
80483dc: e8 13 ff ff ff call 80482f4 <puts#plt>
80483e1: b8 00 00 00 00 mov eax,0x0
80483e6: 83 c4 04 add esp,0x4
80483e9: 59 pop ecx
80483ea: 5d pop ebp
80483eb: 8d 61 fc lea esp,[ecx-0x4]
80483ee: c3 ret
80483ef: 90 nop

godbolt is a very useful tool, they list only has C++ compilers but you can use -x c flag in order to get it treat the code as C. It will then generate an assembly listing for your code side by side and you can use the Colourise option to generate colored bars to visually indicate which source code maps to the generated assembly. For example the following code:
#include <stdio.h>
void func()
{
printf( "hello world\n" ) ;
}
using the following command line:
-x c -std=c99 -O3
and Colourise would generate the following:

Did you try gcc -S -fverbose-asm -O source.c then look into the generated source.s assembler file ?
The generated assembler code goes into source.s (you could override that with -o assembler-filename ); the -fverbose-asm option asks the compiler to emit some assembler comments "explaining" the generated assembler code. The -O option asks the compiler to optimize a bit (it could optimize more with -O2 or -O3).
If you want to understand what gcc is doing try passing -fdump-tree-all but be cautious: you'll get hundreds of dump files.
BTW, GCC is extensible thru plugins or with MELT (a high level domain specific language to extend GCC; which I abandoned in 2017)

You can use gdb for this like objdump.
This excerpt is taken from http://sources.redhat.com/gdb/current/onlinedocs/gdb_9.html#SEC64
Here is an example showing mixed source+assembly for Intel x86:
(gdb) disas /m main
Dump of assembler code for function main:
5 {
0x08048330 : push %ebp
0x08048331 : mov %esp,%ebp
0x08048333 : sub $0x8,%esp
0x08048336 : and $0xfffffff0,%esp
0x08048339 : sub $0x10,%esp
6 printf ("Hello.\n");
0x0804833c : movl $0x8048440,(%esp)
0x08048343 : call 0x8048284
7 return 0;
8 }
0x08048348 : mov $0x0,%eax
0x0804834d : leave
0x0804834e : ret
End of assembler dump.

Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S -c foo.c

I haven't given a shot to gcc, but in case of g++, the command below works for me.
-g for debug build
-Wa,-adhln are passed to assembler for listing with source code
g++ -g -Wa,-adhln src.cpp

For risc-v dissasembly, these flags are nice:
riscv64-unknown-elf-objdump -d -S -l --visualize-jumps --disassembler-color=color --inlines
-d: disassemble, most basic flag
-S: intermix source. Note: must use -g flag while compiling
-l: line numbers
--visualize-jumps: fancy arrows, not too useful but why not. Sometimes get's too messy and actually makes reading the source harder. Taken from Peter Cordes's comment: --visualize-jumps=coloris also an option, to use different colors for different arrows
--disassembler-color=color: give the disassembly some color
--inlines: print out inlines
Maybe usefull:
-M numeric: Use numeric reg names instead of abi names, useful if you are doing cpu dev and don't know the abi names by heart
-M no-aliases: don't use psudoinstructions like li and call
Example:
main.o:
#include <stdio.h>
#include <stdint.h>
static inline void example_inline(const char* str) {
for (int i = 0; str[i] != 0; i++)
putchar(str[i]);
}
int main() {
printf("Hello world");
example_inline("Hello! I am inlined");
return 0;
}
I recommend to use -O0 if you want intermix sources. Intermix sources becomes very messy if using -O2.
Command:
riscv64-unknown-elf-gcc main.c -c -O0 -g
riscv64-unknown-elf-objdump -d -S -l --disassembler-color=color --inlines main.o
Dissasembly:
main.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <example_inline>:
example_inline():
/Users/cyao/test/main.c:4
#include <stdio.h>
#include <stdint.h>
static inline void example_inline(const char* str) {
0: 7179 addi sp,sp,-48
2: f406 sd ra,40(sp)
4: f022 sd s0,32(sp)
6: 1800 addi s0,sp,48
8: fca43c23 sd a0,-40(s0)
000000000000000c <.LBB2>:
/Users/cyao/test/main.c:5
for (int i = 0; str[i] != 0; i++)
c: fe042623 sw zero,-20(s0)
10: a01d j 36 <.L2>
0000000000000012 <.L3>:
/Users/cyao/test/main.c:6 (discriminator 3)
putchar(str[i]);
12: fec42783 lw a5,-20(s0)
16: fd843703 ld a4,-40(s0)
1a: 97ba add a5,a5,a4
1c: 0007c783 lbu a5,0(a5)
20: 2781 sext.w a5,a5
22: 853e mv a0,a5
24: 00000097 auipc ra,0x0
28: 000080e7 jalr ra # 24 <.L3+0x12>
/Users/cyao/test/main.c:5 (discriminator 3)
for (int i = 0; str[i] != 0; i++)
2c: fec42783 lw a5,-20(s0)
30: 2785 addiw a5,a5,1
32: fef42623 sw a5,-20(s0)
0000000000000036 <.L2>:
/Users/cyao/test/main.c:5 (discriminator 1)
36: fec42783 lw a5,-20(s0)
3a: fd843703 ld a4,-40(s0)
3e: 97ba add a5,a5,a4
40: 0007c783 lbu a5,0(a5)
44: f7f9 bnez a5,12 <.L3>
0000000000000046 <.LBE2>:
/Users/cyao/test/main.c:7
}
46: 0001 nop
48: 0001 nop
4a: 70a2 ld ra,40(sp)
4c: 7402 ld s0,32(sp)
4e: 6145 addi sp,sp,48
50: 8082 ret
0000000000000052 <main>:
main():
/Users/cyao/test/main.c:9
int main() {
52: 1141 addi sp,sp,-16
54: e406 sd ra,8(sp)
56: e022 sd s0,0(sp)
58: 0800 addi s0,sp,16
/Users/cyao/test/main.c:10
printf("Hello world");
5a: 000007b7 lui a5,0x0
5e: 00078513 mv a0,a5
62: 00000097 auipc ra,0x0
66: 000080e7 jalr ra # 62 <main+0x10>
/Users/cyao/test/main.c:11
example_inline("Hello! I am inlined");
6a: 000007b7 lui a5,0x0
6e: 00078513 mv a0,a5
72: 00000097 auipc ra,0x0
76: 000080e7 jalr ra # 72 <main+0x20>
/Users/cyao/test/main.c:13
return 0;
7a: 4781 li a5,0
/Users/cyao/test/main.c:14
}
7c: 853e mv a0,a5
7e: 60a2 ld ra,8(sp)
80: 6402 ld s0,0(sp)
82: 0141 addi sp,sp,16
84: 8082 ret
PS. There are colors in the dissembled code

use -Wa,-adhln as option on gcc or g++ to produce a listing output to stdout.
-Wa,... is for command line options for the assembler part (execute in gcc/g++ after C/++ compilation). It invokes as internally (as.exe in Windows).
See
>as --help
as command line to see more help for the assembler tool inside gcc

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight