The macro expansion of __read_mostly :
#define __read_mostly __attribute__((__section__(".data..read_mostly"))
This one is from cache.h
__init:
#define __init __section(.init.text) __cold notrace
from init.h
__exit:
#define __exit __section(.exit.text) __exitused __cold notrace
After searching through net i have not found any good explanation of
what is happening there.
Additonal question : I have heard about various "linker magic"
employed in kernel development. Any information
regarding this will be wonderful.
I have some ideas about these macros about what they do. Like __init supposed to indicate that the function code can be removed after initialization. __read_mostly is for indicating that the data is seldom written and by this it minimizes cache misses. But i have not idea about How they do it. I mean they are gcc extensions. So in theory they can be demonstrated by small userland c code.
UPDATE 1:
I have tried to test the __section__ with arbitrary section name. the test code :
#include <stdio.h>
#define __read_mostly __attribute__((__section__("MY_DATA")))
struct ro {
char a;
int b;
char * c;
};
struct ro my_ro __read_mostly = {
.a = 'a',
.b = 3,
.c = NULL,
};
int main(int argc, char **argv) {
printf("hello");
printf("my ro %c %d %p \n", my_ro.a, my_ro.b, my_ro.c);
return 0;
}
Now with __read_mostly the generated assembly code :
.file "ro.c"
.globl my_ro
.section MY_DATA,"aw",#progbits
.align 16
.type my_ro, #object
.size my_ro, 16
my_ro:
.byte 97
.zero 3
.long 3
.quad 0
.section .rodata
.LC0:
.string "hello"
.LC1:
.string "my ro %c %d %p \n"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $.LC0, %eax
movq %rax, %rdi
movl $0, %eax
.cfi_offset 3, -24
call printf
movq my_ro+8(%rip), %rcx
movl my_ro+4(%rip), %edx
movzbl my_ro(%rip), %eax
movsbl %al, %ebx
movl $.LC1, %eax
movl %ebx, %esi
movq %rax, %rdi
movl $0, %eax
call printf
movl $0, %eax
addq $24, %rsp
popq %rbx
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.4.6 20110731 (Red Hat 4.4.6-3)"
.section .note.GNU-stack,"",#progbits
Now without the __read_mostly macro the assembly code remains more or less the same.
this is the diff
--- rm.S 2012-07-17 16:17:05.795771270 +0600
+++ rw.S 2012-07-17 16:19:08.633895693 +0600
## -1,6 +1,6 ##
.file "ro.c"
.globl my_ro
- .section MY_DATA,"aw",#progbits
+ .data
.align 16
.type my_ro, #object
.size my_ro, 16
So essentially only the a subsection is created, nothing fancy.
Even the objdump disassmbly does not show any difference.
So my final conclusion about them, its the linker's job do something for data section marked with a special name. I think linux kernel uses some kind of custom linker script do achieve these things.
One of the thing about __read_mostly, data which were put there can be grouped and managed in a way so that cache misses can be reduced.
Someone at lkml submitted a patch to remove __read_mostly. Which spawned a fascinated discussion on the merits and demerits of __read_mostly.
here is the link : https://lkml.org/lkml/2007/12/13/477
I will post further update on __init and __exit.
UPDATE 2
These macros __init , __exit and __read_mostly put the contents of data(in case of __read_mostly) and text(in cases of __init and __exit) are put into custom named sections. These sections are utilized by the linker. Now as linker is not used as its default behaviour for various reasons, A linker script is employed to achieve the purposes of these macros.
A background may be found how a custom linker script can be used to eliminate dead code(code which is linked to by linker but never executed). This issue is of very high importance in embedded scenarios. This document discusses how a linker script can be fine tuned to remove dead code : elinux.org/images/2/2d/ELC2010-gc-sections_Denys_Vlasenko.pdf
In case kernel the initial linker script can be found include/asm-generic/vmlinux.lds.h. This is not the final script. This is kind of starting point, the linker script is further modified for different platforms.
A quick look at this file the portions of interest can immediately found:
#define READ_MOSTLY_DATA(align) \
. = ALIGN(align); \
*(.data..read_mostly) \
. = ALIGN(align);
It seems this section is using the ".data..readmostly" section.
Also you can find __init and __exit section related linker commands :
#define INIT_TEXT \
*(.init.text) \
DEV_DISCARD(init.text) \
CPU_DISCARD(init.text) \
MEM_DISCARD(init.text)
#define EXIT_TEXT \
*(.exit.text) \
DEV_DISCARD(exit.text) \
CPU_DISCARD(exit.text) \
MEM_DISCARD(exit.text)
Linking seems pretty complex thing to do :)
GCC attributes are a general mechanism to give instructions to the compiler that are outside the specification of the language itself.
The common facility that the macros you list is the use of the __section__ attribute which is described as:
The section attribute specifies that a function lives in a particular section. For example, the declaration:
extern void foobar (void) __attribute__ ((section ("bar")));
puts the function foobar in the bar section.
So what does it mean to put something in a section? An object file is divided into sections: .text for executable machine code, .data for read-write data, .rodata for read-only data, .bss for data initialised to zero, etc. The names and purposes of these sections is a matter of platform convention, and some special sections can only be accessed from C using the __attribute__ ((section)) syntax.
In your example you can guess that .data..read_mostly is a subsection of .data for data that will be mostly read; .init.text is a text (machine code) section that will be run when the program is initialised, etc.
On Linux, deciding what to do with the various sections is the job of the kernel; when userspace requests to exec a program, it will read the program image section-by-section and process them appropriately: .data sections get mapped as read-write pages, .rodata as read-only, .text as execute-only, etc. Presumably .init.text will be executed before the program starts; that could either be done by the kernel or by userspace code placed at the program's entry point (I'm guessing the latter).
If you want to see the effect of these attributes, a good test is to run gcc with the -S option to output assembler code, which will contain the section directives. You could then run the assembler with and without the section directives and use objdump or even hex dump the resulting object file to see how it differs.
As far as I know, these macros are used exclusively by the kernel. In theory, they could apply to user-space, but I don't believe this is the case. They all group similar variable and code together for different effects.
init/exit
A lot of code is needed to setup the kernel; this happens before any user space is running at all. Ie, before the init task runs. In many cases, this code is never used again. So it would be a waste to consume un-swappable RAM after boot. The familiar kernel message Freeing init memory is a result of the init section. Some drivers maybe configured as modules. In these cases, they exit. However, if they are compiled into the kernel, the don't necessarily exit (they may shutdown). This is another section to group this type of code/data.
cold/hot
Each cache line has a fixed sized. You can maximize a cache by putting the same type of data/function in it. The idea is that often used code can go side by side. If the cache is four instructions, the end of one hot routine should merge with the beginning of the next hot routine. Similarly, it is good to keep seldom used code together, as we hope it never goes in the cache.
read_mostly
The idea here is similar to hot; the difference with data we can update the values. When this is done, the entire cache line becomes dirty and must be re-written to main RAM. This is needed for multi-CPU consistency and when that cache line goes stale. If nothing has changed in the difference between the CPU cache version and main memory, then nothing needs to happen on an eviction. This optimizes the RAM bus so that other important things can happen.
These items are strictly for the kernel. Similar tricks could (are?) be implemented for user space. That would depend on the loader in use; which is often different depending on the libc in use.
Related
When I compile this code using different compilers and inspect the output in a hex editor I am expecting to find the string "Nancy" somewhere.
#include <stdio.h>
int main()
{
char temp[6] = "Nancy";
printf("%s", temp);
return 0;
}
The output file for gcc -o main main.c looks like this:
The output for g++ -o main main.c, I can't see to find "Nancy" anywhere.
Compiling the same code in visual studio (MSVC 1929) I see the full string in a hex editor:
Why do I get some random bytes in the middle of the string in (1)?
There is no single rule about how a compiler stores data in the output files it produces.
Data can be stored in a “constant” section.
Data can be built into the “immediate” operands of instructions, in which data is encoded in various fields of the bits that encode an instruction.
Data can be computed from other data by instructions generated by the compiler.
I suspect the case where you see “Nanc” in one place and “y” in another is the compiler using a load instruction (may be written with “mov”) that loads the bytes forming “Nanc” as an immediate operand and another load instruction that loads the bytes forming “y” with a trailing null character, along with other instructions to store the loaded data on the stack and pass its address to printf.
You have not provided enough information to diagnose the g++ case: You did not name the compiler or its version number or provide any part of the generated output.
I reproduced it, using gcc 9.3.0 (Linux Mint 20.2), on x86-64 system (Intel
Result of hexdump -C:
Note the byte sequence is the same.
So I use gcc -S -c:
.file "teststr.c"
.text
.section .rodata
.LC0:
.string "%s"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $1668178254, -14(%rbp) # NOTE THIS PART HERE
movw $121, -10(%rbp) # AND HERE
leaq -14(%rbp), %rax
movq %rax, %rsi
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L3
call __stack_chk_fail#PLT
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
The highlighted value 1668178254 is hex 636E614E or "cnaN" (which, due to the endian reversal as x86 is a little-endian system, becomes "Nanc") in ASCII encoding, and 121 is hex 79, or "y".
So it uses two move instructions instead of a loop copy from a byte string section of the file given it's a short string, and the intervening "garbage" is (I believe) the following movw instruction. Likely a way to optimize the initialization, versus looping byte-by-byte through memory, even though no optimization flag was "officially" given to the compiler - that's the thing, the compiler can do what it wants to do in this regard. Microsoft's compiler, then, seems to be more "pedantic" in how it compiles because it does, in fact, apparently forgo that optimization in favor of putting the string together contiguously.
Generally a compiled program is split into different types of "section". The assembler file will use directives to switch between them.
Code (".text")
Static read-only data (".section .rodata")
Initialised global or static variables (".data")
Uninitialised (or zero-initialized) global or static variables (".bss")
String literals in C can be used in two different ways.
As a pointer to constant data.
As an initaliser for an array.
If a string literal is used as a pointer then it is likely the compiler will place the string data in the read only data section.
If a string literal is used to initialise a global/static array then it is likely the compiler will place the array in the initilised data section (or the read-only data section if the array is declared as const).
However in your case the array you are initialising is an automatic local variable. So it can't be pre-initialised before program start. The compiler must include code to initialise it each time your function runs.
The compiler might choose to do that by storing the string in a read-only data location and then using a copy routine (either inlined or a call) to copy it to the local array. (In that case there will be a contiguous copy of the whole thing, otherwise there won't be.) It may chose to simply generate instructions to set the elements of the array one by one. It may choose to generate instructions that set several array elements at the same time. (e.g. 4 bytes and then 2 bytes, including the terminating '\0')
P.S. I've noticed some people posting https//godbolt.org/ links on other answers to this question. The Compiler Explorer is a useful tool but be aware that it hides the section switching directives from the assembler output by default.
I have this short hello world program:
#include <stdio.h>
static const char* msg = "Hello world";
int main(){
printf("%s\n", msg);
return 0;
}
I compiled it into the following assembly code with gcc:
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello world"
.data
.align 4
.type msg, #object
.size msg, 4
msg:
.long .LC0
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
movl msg, %eax
movl %eax, (%esp)
call puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
My question is: are all parts of this code essential if I were to write this program in assembly (instead of writing it in C and then compiling to assembly)? I understand the assembly instructions but there are certain pieces I don't understand. For instance, I don't know what .cfi* is, and I'm wondering if I would need to include this to write this program in assembly.
The absolute bare minimum that will work on the platform that this appears to be, is
.globl main
main:
pushl $.LC0
call puts
addl $4, %esp
xorl %eax, %eax
ret
.LC0:
.string "Hello world"
But this breaks a number of ABI requirements. The minimum for an ABI-compliant program is
.globl main
.type main, #function
main:
subl $24, %esp
pushl $.LC0
call puts
xorl %eax, %eax
addl $28, %esp
ret
.size main, .-main
.section .rodata
.LC0:
.string "Hello world"
Everything else in your object file is either the compiler not optimizing the code down as tightly as possible, or optional annotations to be written to the object file.
The .cfi_* directives, in particular, are optional annotations. They are necessary if and only if the function might be on the call stack when a C++ exception is thrown, but they are useful in any program from which you might want to extract a stack trace. If you are going to write nontrivial code by hand in assembly language, it will probably be worth learning how to write them. Unfortunately, they are very poorly documented; I am not currently finding anything that I think is worth linking to.
The line
.section .note.GNU-stack,"",#progbits
is also important to know about if you are writing assembly language by hand; it is another optional annotation, but a valuable one, because what it means is "nothing in this object file requires the stack to be executable." If all the object files in a program have this annotation, the kernel won't make the stack executable, which improves security a little bit.
(To indicate that you do need the stack to be executable, you put "x" instead of "". GCC may do this if you use its "nested function" extension. (Don't do that.))
It is probably worth mentioning that in the "AT&T" assembly syntax used (by default) by GCC and GNU binutils, there are three kinds of lines: A line
with a single token on it, ending in a colon, is a label. (I don't remember the rules for what characters can appear in labels.) A line whose first token begins with a dot, and does not end in a colon, is some kind of directive to the assembler. Anything else is an assembly instruction.
related: How to remove "noise" from GCC/clang assembly output? The .cfi directives are not directly useful to you, and the program would work without them. (It's stack-unwind info needed for exception handling and backtraces, so -fomit-frame-pointer can be enabled by default. And yes, gcc emits this even for C.)
As far as the number of asm source lines needed to produce a value Hello World program, obviously we want to use libc functions to do more work for us.
#Zwol's answer has the shortest implementation of your original C code.
Here's what you could do by hand, if you don't care about the exit status of your program, just that it prints your string.
# Hand-optimized asm, not compiler output
.globl main # necessary for the linker to see this symbol
main:
# main gets two args: argv and argc, so we know we can modify 8 bytes above our return address.
movl $.LC0, 4(%esp) # replace our first arg with the string
jmp puts # tail-call puts.
# you would normally put the string in .rodata, not leave it in .text where the linker will mix it with other functions.
.section .rodata
.LC0:
.asciz "Hello world" # asciz zero-terminates
The equivalent C (you just asked for the shortest Hello World, not one that had identical semantics):
int main(int argc, char **argv) {
return puts("Hello world");
}
Its exit status is implementation-defined but it definitely prints. puts(3) returns "a non-negative number", which could be outside the 0..255 range, so we can't say anything about the program's exit status being 0 / non-zero in Linux (where the process's exit status is the low 8 bits of the integer passed to the exit_group() system call (in this case by the CRT startup code that called main()).
Using JMP to implement the tail-call is a standard practice, and commonly used when a function doesn't need to do anything after another function returns. puts() will eventually return to the function that called main(), just like if puts() had returned to main() and then main() had returned. main()'s caller still has to deal with the args it put on the stack for main(), because they're still there (but modified, and we're allowed to do that).
gcc and clang don't generate code that modifies arg-passing space on the stack. It is perfectly safe and ABI-compliant, though: functions "own" their args on the stack, even if they were const. If you call a function, you can't assume that the args you put on the stack are still there. To make another call with the same or similar args, you need to store them all again.
Also note that this calls puts() with the same stack alignment that we had on entry to main(), so again we're ABI-compliant in preserving the 16B alignment required by modern version of the x86-32 aka i386 System V ABI (used by Linux).
.string zero-terminates strings, same as .asciz, but I had to look it up to check. I'd recommend just using .ascii or .asciz to make sure you're clear on whether your data has a terminating byte or not. (You don't need one if you use it with explicit-length functions like write())
In the x86-64 System V ABI (and Windows), args are passed in registers. This makes tail-call optimization a lot easier, because you can rearrange args or pass more args (as long as you don't run out of registers). This makes compilers willing to do it in practice. (Because as I said, they currently don't like to generate code that modifies the incoming arg space on the stack, even though the ABI is clear that they're allowed to, and compiler generated functions do assume that callees clobber their stack args.)
clang or gcc -O3 will do this optimization for x86-64, as you can see on the Godbolt compiler explorer:
#include <stdio.h>
int main() { return puts("Hello World"); }
# clang -O3 output
main: # #main
movl $.L.str, %edi
jmp puts # TAILCALL
# Godbolt strips out comment-only lines and directives; there's actually a .section .rodata before this
.L.str:
.asciz "Hello World"
Static data addresses always fit in the low 31 bits of address-space, and executable don't need position-independent code, otherwise the mov would be lea .LC0(%rip), %rdi. (You'll get this from gcc if it was configured with --enable-default-pie to make position-independent executables.)
How to load address of function or label into register in GNU Assembler
Hello World using 32-bit x86 Linux int 0x80 system calls directly, no libc
See Hello, world in assembly language with Linux system calls? My answer there was originally written for SO Docs, then moved here as a place to put it when SO Docs closed down. It didn't really belong here so I moved it to another question.
related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.
For example: In the following code, how and where is the number '10' used for the comparison stored?
#include<stdio.h>
#include<conio.h>
int main()
{
int x = 5;
if (x > 10)
printf("X is greater than 10");
else if (x < 10)
printf("X is lesser than 10");
else
printf("x = 10");
getch();
return 0;
}
Pardon me for not giving enough details. Instead of initializing 'x' directly with '5', if we scan and get it from the user we know how memory is allocated for 'x'. But how memory is allocated for the literal number '10' which is not stored in any variable?
In your particular code, x is initialized to 5 and is never changed. An optimizing compiler is able to constant fold and propagate that information. So it probably would generate the equivalent of
int main() {
printf("X is lesser than 10");
getch();
return 0;
}
notice that the compiler would also have done dead code elimination.
So both constants 5 and 10 would have disappeared.
BTW, <conio.h> and getch are not in standard C99 or C11. My Linux system don't have them.
In general (and depending upon the target processor's instruction set and the ABI) small constants are often embedded in some single machine code instruction (as an immediate operand), as Kilian answered. Some large constants (e.g. floating point numbers, literal strings, most const global or static arrays and aggregates) might get inserted and compiled as read only data in the code segment (then the constant inside machine register-load instructions would be an address or some offset relative to PC for PIC); see also this. Some architectures (e.g. SPARC, RISC-V, ARM, and other RISC) are able to load a wide constant in a register by two consecutive instructions (loading the constant in two parts), and this impacts the relocation format for the linker (e.g. in object files and executables, often in ELF).
I suggest to ask your compiler to emit assembler code, and have a glance at that assembler code. If using GCC (e.g. on Linux, or with Cygwin or MinGW) try to compile with gcc -Wall -O -fverbose-asm -S ; on my Debian/Linux system if I replace getch by getchar in your code I am getting:
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "X is lesser than 10"
.text
.globl main
.type main, #function
main:
.LFB11:
.cfi_startproc
subq $8, %rsp #,
.cfi_def_cfa_offset 16
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #
movq stdin(%rip), %rdi # stdin,
call _IO_getc #
movl $0, %eax #,
addq $8, %rsp #,
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (Debian 4.9.2-10) 4.9.2"
.section .note.GNU-stack,"",#progbits
If you are using a 64 bits Windows system, your architecture is very likely to be x86-64. There are tons of documentation describing the ISA (see answers to this) and the x86 calling conventions (and also the Linux x86-64 ABI; you'll find the equivalent document for Windows).
BTW, you should not really care how such constants are implemented. The semantics of your code should not change, whatever the compiler choose to do for implementing them. So leave the optimizations (and such low level choices) to the compiler (i.e. your implementation of C).
The constant 10 is probably stored as an immediate constant in the opcode stream. Issuing a CMP AX,10, with the constant included in the opcode, is usually both smaller and faster than a CMP AX, [BX], where the comparison value must be loaded from memory.
If the constant is too large to fit into the opcode, the alternative is to store it in memory like a static variable, but if the instruction set allows embedded constants, a good compiler should use it - after all, that addressing mode was presumably added because it has advantages over the others.
me and my friend got a computer architecture project and we don't really know how to get to it. I hope you could at least point us in the right direction so we know what to look for. As our professor isn't really good at explaining what we really need to do and the subject is rather vague we'll start from the beginning.
Our task is to somehow "edit" GCC to treat some operations differently. For example when you add two char arguments in a .c program it uses addb. We need to change it to f.e. 16bit registers(addl), without using unnecessary parameters during compilation(just regular gcc p.c -o p). Why or will it work doesn't really matter at this point.
We'd like to know how we could change something inside GCC, where we can even start looking as I can't find any information about similar tasks besides making plugins/extensions. Is there anything we could read about something like this or anything we could use?
In C 'char' variables are normally added together as integers so the C compiler will already use addl. Except when it can see that it makes no difference to the result to use a smaller or faster form.
For example this C code
unsigned char a, b, c;
int i;
void func1(void) { a = b + c; }
void func2(void) { i = b + c; }
Gives this assembler for GCC.
.file "xq.c"
.text
.p2align 4,,15
.globl func1
.type func1, #function
func1:
movzbl c, %eax
addb b, %al
movb %al, a
ret
.size func1, .-func1
.p2align 4,,15
.globl func2
.type func2, #function
func2:
movzbl b, %edx
movzbl c, %eax
addl %edx, %eax
movl %eax, i
ret
.size func2, .-func2
.comm i,4,4
.comm c,1,4
.comm b,1,4
.comm a,1,4
.ident "GCC: (Debian 4.7.2-5) 4.7.2"
.section .note.GNU-stack,"",#progbits
Note that the first function uses addb but the second uses addl because the high bits of the result will be discarded in the first function when the result is stored.
This version of GCC is generating i686 code so the integers are 32bit (addl) depending on exactly what you want you may need to make the result a short or actually get a compiler version that outputs 16bit 8086 code.
I'm interested in learning more x86/x86_64 assembly. Alas, I am on a Mac. No problem, right?
$ gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build
5658) (LLVM build 2336.11.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I wrote a simple "Hello World" in C to get a base-line on what sort of code I'll have to write. I did a little x86 back in college, and have looked up numerous tutorials, but none of them look like the freakish output I'm seeing here:
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $32, %rsp
Ltmp2:
movl %edi, %eax
movl %eax, -4(%rbp)
movq %rsi, -16(%rbp)
leaq L_.str(%rip), %rax
movq %rax, %rdi
callq _puts
movl $0, -24(%rbp)
movl -24(%rbp), %eax
movl %eax, -20(%rbp)
movl -20(%rbp), %eax
addq $32, %rsp
popq %rbp
ret
Leh_func_end1:
.section __TEXT,__cstring,cstring_literals
L_.str:
.asciz "Hello, World!"
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
EH_frame0:
Lsection_eh_frame:
Leh_frame_common:
Lset0 = Leh_frame_common_end-Leh_frame_common_begin
.long Lset0
Leh_frame_common_begin:
.long 0
.byte 1
.asciz "zR"
.byte 1
.byte 120
.byte 16
.byte 1
.byte 16
.byte 12
.byte 7
.byte 8
.byte 144
.byte 1
.align 3
Leh_frame_common_end:
.globl _main.eh
_main.eh:
Lset1 = Leh_frame_end1-Leh_frame_begin1
.long Lset1
Leh_frame_begin1:
Lset2 = Leh_frame_begin1-Leh_frame_common
.long Lset2
Ltmp3:
.quad Leh_func_begin1-Ltmp3
Lset3 = Leh_func_end1-Leh_func_begin1
.quad Lset3
.byte 0
.byte 4
Lset4 = Ltmp0-Leh_func_begin1
.long Lset4
.byte 14
.byte 16
.byte 134
.byte 2
.byte 4
Lset5 = Ltmp1-Ltmp0
.long Lset5
.byte 13
.byte 6
.align 3
Leh_frame_end1:
.subsections_via_symbols
Now...maybe things have changed a bit, but this isn't exactly friendly, even for assembly code. I'm having a hard time wrapping my head around this...Would someone help break down what is going on in this code and why it is all needed?
Many, many thanks in advance.
Since the question is really about those odd labels and data and not really about the code itself, I'm only going to shed some light on them.
If an instruction of the program causes an execution error (such as division by 0 or access to an inaccessible memory region or an attempt to execute a privileged instruction), it results in an exception (not a C++ kind of exception, rather an interrupt kind of it) and forces the CPU to execute the appropriate exception handler in the OS kernel. If we were to totally disallow these exceptions, the story would be very short, the OS would simply terminate the program.
However, there are advantages of letting programs handle their own exceptions and so the primary exception handler in the OS handler reflects some of exceptions back into the program for handling. For example, a program could attempt to recover from the exception or it could save a meaningful crash report before terminating.
In either case, it is useful to know the following:
the function, where the exception has occurred, not just the offending instruction in it
the function that called that function, the function that called that one and so on
and possibly (mainly for debugging):
the line of the source code file, from which this instruction was generated
the lines where these function calls were made
the function parameters
Why do we need to know the call tree?
Well, if the program registers its own exception handlers, it usually does it something like the C++ try and catch blocks:
fxn()
{
try
{
// do something potentially harmful
}
catch()
{
// catch and handle attempts to do something harmful
}
catch()
{
// catch and handle attempts to do something harmful
}
}
If neither of those catches catches, the exception propagates to the caller of fxn and potentially to the caller of the caller of fxn, until there's a catch that catches the exception or until the default exception handler that simply terminates the program.
So, you need to know the code regions that each try covers and you need to know how to get to the next closest try (in the caller of fxn, for example) if the immediate try/catch doesn't catch the exception and it has to bubble up.
The ranges for try and locations of catch blocks are easy to encode in a special section of the executable and they are easy to work with (just do a binary search for the offending instruction addresses in those ranges). But figuring out the next outer try block is harder because you may need to find out the return address from the function, where the exception occurred.
And you may not always rely on rbp+8 pointing to the return address on the stack, because the compiler may optimize the code in such a way that rbp is no longer involved in accessing function parameters and local variables. You can access them through rsp+something as well and save a register and a few instructions, but given the fact that different functions allocate different number of bytes on the stack for the locals and the parameters passed to other functions and adjust rsp differently, just the value of rsp isn't enough to find out the return address and the calling function. rsp can be an arbitrary number of bytes away from where the return address is on the stack.
For such scenarios the compiler includes additional information about functions and their stack usage in a dedicated section of the executable. The exception-handling code examines this information and properly unwinds the stack when exceptions have to propagate to the calling functions and their try/catch blocks.
So, the data following _main.eh contains that additional information. Note that it explicitly encodes the beginning and the size of main() by referring to Leh_func_begin1 and Leh_func_end1-Leh_func_begin1. This piece of info allows the exception-handling code to identify main()'s instructions as main()'s.
It also appears that main() isn't very unique and some of its stack/exception info is the same as in other functions and it makes sense to share it between them. And so there's a reference to Leh_frame_common.
I can't comment further on the structure of _main.eh and the exact meaning of those constants like 144 and 13 as I don't know the format of this data. But generally one doesn't need to know these details unless they are the compiler or the debugger developers.
I hope this give you an idea of what those labels and constants are for.
Ok lets give it a try
// First section of code, declaring the main function that has to be align on a 32 bit boundary.
UPDATE:
My explanation of the .align directive may be wrong. See gas documentation below.
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main:
Store the previous base pointer and allocate stack space for local variables.
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $32, %rsp
Ltmp2:
Push the arguments on the stack and call puts()
movl %edi, %eax
movl %eax, -4(%rbp)
movq %rsi, -16(%rbp)
leaq L_.str(%rip), %rax
movq %rax, %rdi
callq _puts
Put return value on stack, free local memory, restore base pointer and return.
movl $0, -24(%rbp)
movl -24(%rbp), %eax
movl %eax, -20(%rbp)
movl -20(%rbp), %eax
addq $32, %rsp
popq %rbp
ret
Leh_func_end1:
Next section, also a code section, containing the string to print.
.section __TEXT,__cstring,cstring_literals
L_.str:
.asciz "Hello, World!"
The rest is unknown to me, could be data used be the c startup code and or debugging info.
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
...
UPDATE:
Documentation on the .align directive from:
http://sourceware.org/binutils/docs-2.23.1/as/Align.html#Align
"The way the required alignment is specified varies from system to system. For the arc, hppa, i386 using ELF, i860, iq2000, m68k, or32, s390, sparc, tic4x, tic80 and xtensa, the first expression is the alignment request in bytes. For example `.align 8' advances the location counter until it is a multiple of 8. If the location counter is already a multiple of 8, no change is needed. For the tic54x, the first expression is the alignment request in words.
For other systems, including ppc, i386 using a.out format, arm and strongarm, it is the number of low-order zero bits the location counter must have after advancement. For example `.align 3' advances the location counter until it a multiple of 8. If the location counter is already a multiple of 8, no change is needed.
This inconsistency is due to the different behaviors of the various native assemblers for these systems which GAS must emulate. GAS also provides .balign and .p2align directives, described later, which have a consistent behavior across all architectures (but are specific to GAS)."
//jk
You can find the answers for pretty much any questions you've got related to the directives here and here.
For example:
.section __TEXT,__text,regular,pure_instructions
Declares a section named __TEXT,__text with the default section type and specify that this section will contain only machine code (i.e. no data).
.globl _main
Makes the _main label (symbol) global, so that it will be visible to the linker.
.align 4, 0x90
Aligns the location counter to the next 2^4 (==16) byte boundary. The space in between will be filled with the value 0x90 (==NOP).
As for the code itself, it's obviously doing a lot of redundant intermediary loads and stores. Try compiling with optimizations enabled as one of the commentators suggested and you should find that the resulting code will make more sense.