How Does C Compiled to ASM Know Where to Branch to for External Functions?

How Does C Compiled to ASM Know Where to Branch to for External Functions? - c

How does C compiled to ARM ASM know where to branch to for external functions?
For example, here is a simple C program:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
and its corresponding ARM ASM program:
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "Hello World!\000"
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.size main, .-main
.ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
.section .note.GNU-stack,"",%progbits
I dont see a "printf" tag anywhere so i am assuming that it links outside of the program. but how does it know where to search? it wouldnt look everywhere, because there might be duplicate tags, but there are also libraries that are placed (in the computers perspective) at random, though i also dont see anywhere where it defines a library location.
so where does it link, for more than just the standard C library?
and how can i compile it to not rely on those external dependencies?
or know where the libraries are so i know which files i can delete?
i am currently operating linux on a raspberry pi 400

My computer has an x86_64 processor, but the principle is the same. I'm using gcc 9.3.0.
I copied your code into a file called main.c and compiled it to assembly with gcc -S main.c. It produced the file main.s with the following contents:
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello World!"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
There are a lot of assembler directives here that can make it confusing to read, so I assembled it into an object file (gcc -c main.s) and then ran objdump -d main.o to disassemble it. Here is the output of the disassembly:
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # f <main+0xf>
f: b8 00 00 00 00 mov $0x0,%eax
14: e8 00 00 00 00 callq 19 <main+0x19>
19: b8 00 00 00 00 mov $0x0,%eax
1e: 5d pop %rbp
1f: c3 retq
The first three instructions here are boilerplate, so we'll ignore them. The first interesting instruction is
lea 0x0(%rip),%rdi
This is meant to load the address of the "Hello World!" string into register %rdi. Confusingly, it appears to simply be copying %rip into %rdi.
The next instruction puts a 0 into register %eax. I actually don't know why this is, but it's not really relevant to this discussion.
Then comes the actual call to printf:
callq 19 <main+0x19>
Once again, this uses an address that doesn't seem correct. You may notice that address 0x19 actually points to the next instruction.
The next 3 instructions basically perform the final return 0.
To really answer your question we need to look at more than just assembly code. At this point I would recommend taking some time to research the format of ELF files. I would consider that topic to be beyond the scope of this answer, but it will help you understand what I'm about to show you.
I first want to point out that in both your assembly and mine, the "Hello World!" string is preceded by this directive:
.section .rodata
whereas the main function is preceded by
.text
which is shorthand for
.section .text
These directives instruct the assembler on how to arrange the code and data in the object file. You can see this by printing the section headers of the object file:
$ readelf -S main.o
There are 14 section headers, starting at offset 0x318:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000020 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000258
0000000000000030 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000060
000000000000000d 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 0000006d
000000000000002b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 00000098
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.propert NOTE 0000000000000000 00000098
0000000000000020 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000b8
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000288
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f0
0000000000000138 0000000000000018 12 10 8
[12] .strtab STRTAB 0000000000000000 00000228
000000000000002a 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 000002a0
0000000000000074 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
If you can figure out how to read this output, you will see that the .text section is 0x20 bytes in size (which matches the above disassembly output), and the .rodata section is 0xd (13) bytes in size (i.e. strlen("Hello World!") plus a null byte). The answer your question, however, is in the relocation data:
$ readelf -r main.o
Relocation section '.rela.text' at offset 0x258 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000015 000c00000004 R_X86_64_PLT32 0000000000000000 printf - 4
Relocation section '.rela.eh_frame' at offset 0x288 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
This output is also very confusing to read if you don't know what it means. The first thing to understand is that the relocation sections tell the linker about places in the code that depend on symbols, either in other sections of the same file, or, more frequently, symbols that are defined in other files. The .rela.text section, for example, contains relocation information about the .text section. When this object file is linked into the final executable, the linker will overwrite part of the .text section with the missing addresses.
So, looking at the first entry under .rela.text, we see an offset of 0xb. Looking at the disassembly, we can see that offset 0xb references the fourth byte of the lea instruction's 7-byte encoding. The type, R_X86_64_PC32, tells us that that instruction is expecting a 32-bit PC-relative address, so we can expect the linker to overwrite the next 4 bytes (currently all 0). The rightmost column tells us, in human readable format, that this address needs to be populated with the address of the .rodata section minus 4 (with PC-relative addressing you have to remember that the PC will be pointing at the next instruction). It leaves out the fact, implicit for relocation type R_X86_64_PC32, that it will then subtract from that the final address of byte 0xb in the .text section, which will make that a valid PC-relative pointer to the "Hello World!" string data.
Similarly, the second entry tells the linker to copy the address of printf (minus 4) to offset 0x15 in the .text section, which would be the last 4 bytes of the callq instruction encoding. In this case, the type is R_X86_64_PLT32, which tells us that it's pointing to an entry in the procedure linkage table (PLT). A PLT is used for dynamic linking so that shared object libraries (in this case libc.so) can be loaded into physical memory once and shared by many running executables.
As a note, that might answer some of your specific questions, your compiler automatically links all the runtime libraries needed to execute a program. This includes any standard library functions, which would be part of libc.so. The only way to run without "external dependencies" would be to run on a bare-metal system (i.e. one without an operating system). Any operating system you use will have to do some amount of work to get your program to the start of main().

Related

Will an executable access shared-libraries' global variable via GOT?

I was learning dynamic linking recently and gave it a try:
dynamic.c
int global_variable = 10;
int XOR(int a) {
return global_variable;
}
test.c
#include <stdio.h>
extern int global_variable;
extern int XOR(int);
int main() {
global_variable = 3;
printf("%d\n", XOR(0x10));
}
The compiling commands are:
clang -shared -fPIC -o dynamic.so dynamic.c
clang -o test test.c dynamic.so
I was expecting that in executable test the main function will access global_variable via GOT. However, on the contrary, the global_variable is placed in test's data section and XOR in dynamic.so access the global_variable indirectly.
Could anyone tell me why the compiler didn't ask the test to access global_variable via GOT, but asked the shared object file to do so?

Part of the point of a shared library is that one copy gets loaded into memory, and multiple processes can access that one copy. But every program has its own copy of each of the library's variables. If they were accessed relative to the library's GOT then those would instead be shared among the processes using the library, just like the functions are.
There are other possibilities, but it is clean and consistent for each executable to provide for itself all the variables it needs. That requires the library functions to access all of its variables with static storage duration (not just external ones) indirectly, relative to the program. This is ordinary dynamic linking, just going the opposite direction from what you usually think of.

Turns out my clang produced PIC by default so it messed with results.
I will leave updated answer here, and the original can be read below it.
After digging a bit more into the topic i have noticed that compilation of test.c does not generate a .got section by itself. You can check it by compiling the executable into an object file and omitting the linking step for now (-c option):
clang -c -o test.o test.c
If you inspect the sections of resulting object file with readelf -S you will notice that there is no .got in there:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000035 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000210
0000000000000060 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000075
0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 00000079
0000000000000013 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 0000008c
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.pr[...] NOTE 0000000000000000 00000090
0000000000000030 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000c0
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000270
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f8
00000000000000d8 0000000000000018 12 4 8
[12] .strtab STRTAB 0000000000000000 000001d0
000000000000003e 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00000288
0000000000000074 0000000000000000 0 0 1
This means that the entirety of .got section present in the test executable actually comes from dynamic.so, as it is PIC and uses GOT.
Would it be possible to compile dynamic.so as non-PIC as well? Turns out it apparently used to be 10 years ago (the article compiles examples to 32-bits, they dont have to work on 64 bits!). Linked article describes how a non-PIC shared library was relocated at load time - basically, every time an address that needed to be relocated after loading was present in machine code, it was instead set to zeroes and a relocation of a certain type was set in the library. During loading of the library the loader filled the zeros with actual runtime address of data/code that was needed. It is important to note that it cannot be applied in your though as 64-bit shared libraries cannot be made out of non-PIC (Source).
If you compile dynamic.so as a shared 32-bit library instead and do not use the -fPIC option (you usually need special repositories enabled to compile 32-bit code and have 32-bit libc installed):
gcc -m32 dynamic.c -shared -o dynamic.so
You will notice that:
// readelf -s dynamic.so
(... lots of output)
27: 00004010 4 OBJECT GLOBAL DEFAULT 19 global_variable
// readelf -S dynamic.so
(... lots of output)
[17] .got PROGBITS 00003ff0 002ff0 000010 04 WA 0 0 4
[18] .got.plt PROGBITS 00004000 003000 00000c 04 WA 0 0 4
[19] .data PROGBITS 0000400c 00300c 000008 00 WA 0 0 4
[20] .bss NOBITS 00004014 003014 000004 00 WA 0 0 1
global_variable is at offset 0x4010 which is inside .data section. Also, while .got is present (at offset 0x3ff0), it only contains relocations coming from other sources than your code:
// readelf -r
Offset Info Type Sym.Value Sym. Name
00003f28 00000008 R_386_RELATIVE
00003f2c 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003ff0 00000106 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003ff4 00000206 R_386_GLOB_DAT 00000000 __cxa_finalize#GLIBC_2.1.3
00003ff8 00000306 R_386_GLOB_DAT 00000000 __gmon_start__
00003ffc 00000406 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]
This article introduces GOT as part of introduction on PIC, and i have found that to be the case in plenty of places, which would suggest that indeed GOT is only used by PIC code although i am not 100% sure of it and i recommend researching the topic more.
What does this mean for you? A section in the first article i linked called "Extra credit #2" contains an explanation for a similar scenario. Although it is 10 years old, uses 32-bit code and the shared library is non-PIC it shares some similarities with your situation and might explain the problem you presented in your question.
Also keep in mind that (although similar) -fPIE and -fPIC are two separate options with slightly different effects and that if your executable during inspection is not loaded at 0x400000 then it probably is compiled as PIE without your knowledge which might also have impact on results. In the end it all boils down to what data is to be shared between processes, what data/code can be loaded at arbitrary address, what has to be loaded at fixed address etc. Hope this helps.
Also two other answers on Stack Overflow which seem relevant to me: here and here. Both the answers and comments.
Original answer:
I tried reproducing your problem with exactly the same code and compilation commands as the ones you provided, but it seems like both main and XOR use the GOT to access the global_variable. I will answer by providing example output of commands that i used to inspect the data flow. If your outputs differ from mine, it means there is some other difference between our environments (i mean a big difference, if only addresses/values are different then its ok). Best way to find that difference is for you to provide commands you originally used as well as their output.
First step is to check what address is accessed whenever a write or read to global_variable happens. For that we can use objdump -D -j .text test command to disassemble the code and look at the main function:
0000000000001150 <main>:
1150: 55 push %rbp
1151: 48 89 e5 mov %rsp,%rbp
1154: 48 8b 05 8d 2e 00 00 mov 0x2e8d(%rip),%rax # 3fe8 <global_variable>
115b: c7 00 03 00 00 00 movl $0x3,(%rax)
1161: bf 10 00 00 00 mov $0x10,%edi
1166: e8 d5 fe ff ff call 1040 <XOR#plt>
116b: 89 c6 mov %eax,%esi
116d: 48 8d 3d 90 0e 00 00 lea 0xe90(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1174: b0 00 mov $0x0,%al
1176: e8 b5 fe ff ff call 1030 <printf#plt>
117b: 31 c0 xor %eax,%eax
117d: 5d pop %rbp
117e: c3 ret
117f: 90 nop
Numbers in the first column are not absolute addresses - instead they are offsets relative to the base address at which the executable will be loaded. For the sake of explanation i will refer to them as "offsets".
The assembly at offset 0x115b and 0x1161 comes directly from the line global_variable = 3; in your code. To confirm that, you could compile the program with -g for debug symbols and invoke objdump with -S. This will display source code above corresponding assembly.
We will focus on what these two instructions are doing. First instruction is a mov of 8 bytes from a location in memory to the rax register. The location in memory is given as relative to the current rip value, offset by a constant 0x2e8d. Objdump already calculated the value for us, and it is equal to 0x3fe8. So this will take 8 bytes present in memory at the 0x3fe8 offset and store them in the rax register.
Next instruction is again a mov, the suffix l tells us that data size is 4 bytes this time. It stores a 4 byte integer with value equal to 0x3 in the location pointed to by the current value of rax (not in the rax itself! brackets around a register such as those in (%rax) signify that the location in the instruction is not the register itself, but rather where its contents are pointing to!).
To summarize, we read a pointer to a 4 byte variable from a certain location at offset 0x3fe8 and later store an immediate value of 0x3 at the location specified by said pointer. Now the question is: where does that offset of 0x3fe8 come from?
It actually comes from GOT. To show the contents of the .got section we can use the objdump -s -j .got test command. -s means we want to focus on actual raw contents of the section, without any disassembling. The output in my case is:
test: file format elf64-x86-64
Contents of section .got:
3fd0 00000000 00000000 00000000 00000000 ................
3fe0 00000000 00000000 00000000 00000000 ................
3ff0 00000000 00000000 00000000 00000000 ................
The whole section is obviously set to zero, as GOT is populated with data after loading the program into memory, but what is important is the address range. We can see that .got starts at 0x3fd0 offset and ends at 0x3ff0. This means it also includes the 0x3fe8 offset - which means the location of global_variable is indeed stored in GOT.
Another way of finding this information is to use readelf -S test to show sections of the executable file and scroll down to the .got section:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
(...lots of sections...)
[22] .got PROGBITS 0000000000003fd0 00002fd0
0000000000000030 0000000000000008 WA 0 0 8
Looking at the Address and Size columns, we can see that the section is loaded at offset 0x3fd0 in memory and its size is 0x30 - which corresponds to what objdump displayed. Note that in readelf ouput "Offset" is actually the offset into the file form which the program is loaded - not the offset in memory that we are interested in.
by issuing the same commands on the dynamic.so library we get similar results:
00000000000010f0 <XOR>:
10f0: 55 push %rbp
10f1: 48 89 e5 mov %rsp,%rbp
10f4: 89 7d fc mov %edi,-0x4(%rbp)
10f7: 48 8b 05 ea 2e 00 00 mov 0x2eea(%rip),%rax # 3fe8 <global_variable##Base-0x38>
10fe: 8b 00 mov (%rax),%eax
1100: 5d pop %rbp
1101: c3 ret
So we see that both main and XOR use GOT to find the location of global_variable.
As for the location of global_variable we need to run the program to populate GOT. For that we can use GDB. We can run our program in GDB by invoking it this way:
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:." gdb ./test
LD_LIBRARY_PATH environment variable tells linker where to look for shared objects, so we extend it to include the current directory "." so that it may find dynamic.so.
After the GDB loads our code, we may invoke break main to set up a breakpoint at main and run to run the program. The program execution should pause at the beginning of the main function, giving us a view into our executable after it was fully loaded into memory, with GOT populated.
Running disassemble main in this state will show us the actual absolute offsets into memory:
Dump of assembler code for function main:
0x0000555555555150 <+0>: push %rbp
0x0000555555555151 <+1>: mov %rsp,%rbp
=> 0x0000555555555154 <+4>: mov 0x2e8d(%rip),%rax # 0x555555557fe8
0x000055555555515b <+11>: movl $0x3,(%rax)
0x0000555555555161 <+17>: mov $0x10,%edi
0x0000555555555166 <+22>: call 0x555555555040 <XOR#plt>
0x000055555555516b <+27>: mov %eax,%esi
0x000055555555516d <+29>: lea 0xe90(%rip),%rdi # 0x555555556004
0x0000555555555174 <+36>: mov $0x0,%al
0x0000555555555176 <+38>: call 0x555555555030 <printf#plt>
0x000055555555517b <+43>: xor %eax,%eax
0x000055555555517d <+45>: pop %rbp
0x000055555555517e <+46>: ret
End of assembler dump.
(gdb)
Our 0x3fe8 offset has turned into an absolute address of equal to 0x555555557fe8. We may again check that this location comes from the .got section by issuing maintenance info sections inside GDB, which will list a long list of sections and their memory mappings. For me .got is placed in this address range:
[21] 0x555555557fd0->0x555555558000 at 0x00002fd0: .got ALLOC LOAD DATA HAS_CONTENTS
Which contains 0x555555557fe8.
To finally inspect the address of global_variable itself we may examine the contents of that memory by issuing x/xag 0x555555557fe8. Arguments xag of the x command deal with the size, format and type of data being inspected - for explanation invoke help x in GDB. On my machine the command returns:
0x555555557fe8: 0x7ffff7fc4020 <global_variable>
On your machine it may only display the address and the data, without the "<global_variable>" helper, which probably comes from an extension i have installed called pwndbg. It is ok, because the value at that address is all we need. We now know that the global_variable is located in memory under the address 0x7ffff7fc4020. Now we may issue info proc mappings in GDB to find out what address range does this address belong to. My output is pretty long, but among all the ranges listed there is one of interest to us:
0x7ffff7fc4000 0x7ffff7fc5000 0x1000 0x3000 /home/user/test_got/dynamic.so
The address is inside of that memory area, and GDB tells us that it comes from the dynamic.so library.
In case any of the outputs of said commands are different for you (change in a value is ok - i mean a fundamental difference like addresses not belonging to certain address ranges etc.) please provide more information about what exactly did you do to come to the conclusion that global_variable is stored in the .data section - what commands did you invoke and what outputs they produced.

Keep rodata located with the function that created it

I'm trying to make .rodata section location stay with its associated function memory location. I'm using the GNU compiler/linker, bare metal, plain-jane c, with an STM32L4A6 micro-controller.
I have a custom board using an STM32L4A6 controller with 1Meg of Flash divided into 512 - 2K pages. Each page can be individually erased and programmed from a function running in RAM. I'd like to take advantage of this fine-grained flash organization to create an embedded firmware application that can be updated on-the-fly by modifying or adding individual functions in the code. My scheme is to dedicate a separate page of flash for each function that might ever need to be changed or created. It's extremely wasteful of flash but I'll never use more than ~10% of it so I can afford to to be wasteful. I've made some progress on this and can now make significant changes to the operation of my application by uploading very small bits of binary code. These "Patches" often do not even require a system reboot.
The problem I'm having is that when a function contains any sort of constant data, such as a literal string, it winds up in the .rodata section. I need the rodata for a given function to stay in the same area as the function that created it. Does anyone know how I might be able to force the .rodata that is created in a function to stay attached to that same function in flash? Like maybe the .rodata from that function could be positioned immediately following the function itself? Maybe I need to use -ffunction-sections or something like that? I've been through the various linker manuals but still can't figure how to do this. Below is the start of my linker script. I don't know how to include function .rodata in the individual page sections.
Example function:
#define P018 __attribute__((long_call, section(".txt018")))
P018 int Function18(int A, int B){int C = A*B; return C;}
A better example that shows my problem would be the following:
#define P152 __attribute__((long_call, section(".txt152")))
P152 void TestFunc(int A){printf("%d Squared Is: %d\r\n",A,A*A);}
In this case, the binary equivalent of "%d Squared Is: %d\r\n" can be found in .rodata with all of the other literal strings in my program. I would prefer it to be located in section .txt152 .
Linker Script snippet (Mostly generated from a simple console program.)
MEMORY
{
p000 (rx) : ORIGIN = 0x08000000, LENGTH = 0x8000
p016 (rx) : ORIGIN = 0x08008000, LENGTH = 0x800
p017 (rx) : ORIGIN = 0x08008800, LENGTH = 0x800
p018 (rx) : ORIGIN = 0x08009000, LENGTH = 0x800
.
.
.
p509 (rx) : ORIGIN = 0x080fe800, LENGTH = 0x800
p510 (rx) : ORIGIN = 0x080ff000, LENGTH = 0x800
p511 (rx) : ORIGIN = 0x080ff800, LENGTH = 0x800
ram (rwx) : ORIGIN = 0x20000000, LENGTH = 256K
ram2 (rw) : ORIGIN = 0x10000000, LENGTH = 64K
}
SECTIONS
{
.vectors :
{
KEEP(*(.isr_vector .isr_vector.*))
} > p000
.txt016 : { *(.txt016) } > p016 /* first usable 2k page following 32k p000 */
.txt017 : { *(.txt017) } > p017
.txt018 : { *(.txt018) } > p018
.
.
.
.txt509 : { *(.txt509) } > p509
.txt510 : { *(.txt510) } > p510
.txt511 : { *(.txt511) } > p511
.text :
{
*(.text .text.* .gnu.linkonce.t.*)
*(.glue_7t) *(.glue_7)
*(.rodata .rodata* .gnu.linkonce.r.*)
} > p000
.
.
.
In case anyone's interested, here's my RAM code for doing the erase/program operation
__attribute__((long_call, section(".data")))
void CopyPatch
(
unsigned short Page,
unsigned int NumberOfBytesToFlash,
unsigned char *PatchBuf
)
{
unsigned int i;
unsigned long long int *Flash;
__ASM volatile ("cpsid i" : : : "memory"); //disable interrupts
Flash = (unsigned long long int *)(FLASH_BASE + Page*2048); //set flash memory pointer to Page address
GPIOE->BSRR = GPIO_BSRR_BS_1; //make PE1(LED) high
FLASH->KEYR = 0x45670123; //unlock the flash
FLASH->KEYR = 0xCDEF89AB; //unlock the flash
while(FLASH->SR & FLASH_SR_BSY){} //wait while flash memory operation is in progress
FLASH->CR = FLASH_CR_PER | (Page << 3); //set Page erase bit and the Page to erase
FLASH->CR |= FLASH_CR_STRT; //start erase of Page
while(FLASH->SR & FLASH_SR_BSY){} //wait while Flash memory operation is in progress
FLASH->CR = FLASH_CR_PG; //set flash programming bit
for(i=0;i<(NumberOfBytesToFlash/8+1);i++)
{
Flash[i] = ((unsigned long long int *)PatchBuf)[i]; //copy RAM to FLASH, 8 bytes at a time
while(FLASH->SR & FLASH_SR_BSY){} //wait while flash memory operation is in progress
}
FLASH->CR = FLASH_CR_LOCK; //lock the flash
GPIOE->BSRR = GPIO_BSRR_BR_1; //make PE1(LED) low
__ASM volatile ("cpsie i" : : : "memory"); //enable interrupts
}

Okay ... Sorry for the delay, but I had to think about this a bit ...
I'm not sure you can do this [completely] with a linker script alone. It might be possible, but I think there's an easier/surer way [with a bit of extra prep]
A method I've used before is to compile with -S to get a .s file. Change/mangle that. And, then, compile the modified .s
Note that you may have some trouble with a global like:
int B;
This will go to a .comm section in the asm source. This may not be ideal.
For initialized data:
int B = 23;
You may want to add a section attribute to force it to a special section. Otherwise, it will end up in a .data section
So, I might avoid .comm and/or .bss sections in favor of always using initialized data. That's because .comm has the same issue as .rodata (i.e. it ends up as one big blob).
Anyway, below is a step by step process.
I put the section name macros in a common file (e.g.) sctname.h:
#define _SCTJOIN(_pre,_sct) _pre #_sct
#define _TXTSCT(_sct) __attribute__((section(_SCTJOIN(".txt",_sct))))
#define _DATSCT(_sct) __attribute__((section(_SCTJOIN(".dat",_sct))))
#ifdef SCTNO
#define TXTSCT _TXTSCT(SCTNO)
#define DATSCT _DATSCT(SCTNO)
#endif
Here's a slightly modified version of your .c file (e.g. module.c):
#include <stdio.h>
#ifndef SCTNO
#define SCTNO 152
#endif
#include "sctname.h"
int B DATSCT = 23;
TXTSCT void
TestFunc(int A)
{
printf("%d Squared Is: %d\r\n", A, A * A * B);
}
To create the .s file, we do:
cc -S -Wall -Werror -O2 module.c
The actual section name/number can be specified on the command line:
cc -S -Wall -Werror -O2 -DSCTNO=152 module.c
This gives us a module.s:
.file "module.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%d Squared Is: %d\r\n"
.section .txt152,"ax",#progbits
.p2align 4,,15
.globl TestFunc
.type TestFunc, #function
TestFunc:
.LFB11:
.cfi_startproc
movl %edi, %edx
movl %edi, %esi
xorl %eax, %eax
imull %edi, %edx
movl $.LC0, %edi
imull B(%rip), %edx
jmp printf
.cfi_endproc
.LFE11:
.size TestFunc, .-TestFunc
.globl B
.section .dat152,"aw"
.align 4
.type B, #object
.size B, 4
B:
.long 23
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",#progbits
Now, we have to read in the .s and modify it. I've created a perl script that does this (e.g. rofix):
#!/usr/bin/perl
master(#ARGV);
exit(0);
sub master
{
my(#argv) = #_;
$root = shift(#argv);
$root =~ s/[.][^.]+$//;
$sfile = "$root.s";
$ofile = "$root.TMP";
open($xfsrc,"<$sfile") or
die("rofix: unable to open '$sfile' -- $!\n");
open($xfdst,">$ofile") or
die("rofix: unable to open '$sfile' -- $!\n");
$txtpre = "^[.]txt";
$datpre = "^[.]dat";
# find the text and data sections
seek($xfsrc,0,0);
while ($bf = <$xfsrc>) {
chomp($bf);
if ($bf =~ /^\s*[.]section\s(\S+)/) {
$sctcur = $1;
sctget($txtpre);
sctget($datpre);
}
}
# modify the data sections
seek($xfsrc,0,0);
while ($bf = <$xfsrc>) {
chomp($bf);
if ($bf =~ /^\s*[.]section\s(\S+)/) {
$sctcur = $1;
sctfix();
print($xfdst $bf,"\n");
next;
}
print($xfdst $bf,"\n");
}
close($xfsrc);
close($xfdst);
system("diff -u $sfile $ofile");
rename($ofile,$sfile) or
die("rofix: unable to rename '$ofile' to '$sfile' -- $!\n");
}
sub sctget
{
my($pre) = #_;
my($sctname,#sct);
{
last unless (defined($pre));
#sct = split(",",$sctcur);
$sctname = shift(#sct);
last unless ($sctname =~ /$pre/);
printf("sctget: FOUND %s\n",$sctname);
$sct_lookup{$pre} = $sctname;
}
}
sub sctfix
{
my($sctname,#sct);
my($sctnew);
{
last unless ($sctcur =~ /^[.]rodata/);
$sctnew = $sct_lookup{$txtpre};
last unless (defined($sctnew));
#sct = split(",",$sctcur);
$sctname = shift(#sct);
$sctname .= $sctnew;
unshift(#sct,$sctname);
$sctname = join(",",#sct);
$bf = sprintf("\t.section\t%s",$sctname);
}
}
The difference between the old and new module.s is:
sctget: FOUND .txt152
sctget: FOUND .dat152
--- module.s 2020-04-20 19:02:23.777302484 -0400
+++ module.TMP 2020-04-20 19:06:33.631926065 -0400
## -1,6 +1,6 ##
.file "module.c"
.text
- .section .rodata.str1.1,"aMS",#progbits,1
+ .section .rodata.txt152,"aMS",#progbits,1
.LC0:
.string "%d Squared Is: %d\r\n"
.section .txt152,"ax",#progbits
So, now, create the .o with:
cc -c module.s
For a makefile, it might be something like [with some wildcards]:
module.o: module.c
cc -S -Wall -Werror -O2 module.c
./rofix module.s
cc -c module.s
Now, you can add appropriate placements in your linker script for [your original section] .txt152 and the new .rodata.txt152.
And, the initialized data section .dat152
Note that the actual naming conventions are arbitrary. If you want to change them, just modify rofix [and the linker script] appropriately
Here's the readelf -a output for module.o:
Note that there's also a .rela.txt152 section!?!?
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 808 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 15
Section header string table index: 14
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000000 0000000000000000 AX 0 0 1
[ 2] .data PROGBITS 0000000000000000 00000040
0000000000000000 0000000000000000 WA 0 0 1
[ 3] .bss NOBITS 0000000000000000 00000040
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .rodata.txt152 PROGBITS 0000000000000000 00000040
0000000000000014 0000000000000001 AMS 0 0 1
[ 5] .txt152 PROGBITS 0000000000000000 00000060
000000000000001a 0000000000000000 AX 0 0 16
[ 6] .rela.txt152 RELA 0000000000000000 00000250
0000000000000048 0000000000000018 I 12 5 8
[ 7] .dat152 PROGBITS 0000000000000000 0000007c
0000000000000004 0000000000000000 WA 0 0 4
[ 8] .comment PROGBITS 0000000000000000 00000080
000000000000002d 0000000000000001 MS 0 0 1
[ 9] .note.GNU-stack PROGBITS 0000000000000000 000000ad
0000000000000000 0000000000000000 0 0 1
[10] .eh_frame PROGBITS 0000000000000000 000000b0
0000000000000030 0000000000000000 A 0 0 8
[11] .rela.eh_frame RELA 0000000000000000 00000298
0000000000000018 0000000000000018 I 12 10 8
[12] .symtab SYMTAB 0000000000000000 000000e0
0000000000000150 0000000000000018 13 11 8
[13] .strtab STRTAB 0000000000000000 00000230
000000000000001c 0000000000000000 0 0 1
[14] .shstrtab STRTAB 0000000000000000 000002b0
0000000000000078 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
There is no dynamic section in this file.
Relocation section '.rela.txt152' at offset 0x250 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000a 00050000000a R_X86_64_32 0000000000000000 .rodata.txt152 + 0
000000000011 000c00000002 R_X86_64_PC32 0000000000000000 B - 4
000000000016 000d00000004 R_X86_64_PLT32 0000000000000000 printf - 4
Relocation section '.rela.eh_frame' at offset 0x298 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000600000002 R_X86_64_PC32 0000000000000000 .txt152 + 0
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS module.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 4
6: 0000000000000000 0 SECTION LOCAL DEFAULT 5
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 9
9: 0000000000000000 0 SECTION LOCAL DEFAULT 10
10: 0000000000000000 0 SECTION LOCAL DEFAULT 8
11: 0000000000000000 26 FUNC GLOBAL DEFAULT 5 TestFunc
12: 0000000000000000 4 OBJECT GLOBAL DEFAULT 7 B
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
No version information found in this file.

`.comm` directive in GAS not showing in `.o` file?

This is a purely pedagogical question.
I have the following C code, in a file called comm.c:
#include <stdio.h>
int a;
int main(){
printf("%d", a);
return 0;
}
The code prints "0\n".
Compiling with gcc -S, I get the following assembly code:
.file "comm.c"
.text
.comm a,4,4
.section .rodata
.LC0:
.string "%d"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl a(%rip), %eax
movl %eax, %esi
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 9.1.0"
.section .note.GNU-stack,"",#progbits
I am confused as to what .comm a,4,4 is doing. According to 7.96 of
the GNU as manual, the .text directive, it assembles what follows into
the end of the .text section. Thus, I would think that the beginning
of the .text section contains four bytes allocated to storing the
contents of a. This appears to not be the case, because if we
disassemble the .o file, we find:
comm.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <main+0xa>
a: 89 c6 mov %eax,%esi
c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 13 <main+0x13>
13: b8 00 00 00 00 mov $0x0,%eax
18: e8 00 00 00 00 callq 1d <main+0x1d>
1d: b8 00 00 00 00 mov $0x0,%eax
22: 5d pop %rbp
23: c3 retq
Why aren't there four extra bytes at the beginning of =text=, as is
promised by the .text GAS directive? Of course, that would be stupid, to put data in the text segment.
So I guess my question is: what is .comm doing? Why is it placed under a .text directive?

.comm does not allocate in the .text section, but in the .bss section.
From https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC76 and https://sourceware.org/binutils/docs/as/Comm.html#Comm:
If ld does not see a definition for the symbol--just one or more common symbols--then it will allocate length bytes of uninitialized memory.
It is the linker's job to allocate and map the .comm symbols.
You can see this when you link the program and read the symbols table:
gcc comm.o -o comm
readelf comm -s
The relevant symbols:
Num: Value Size Type Bind Vis Ndx Name
24: 0000000000004030 0 SECTION LOCAL DEFAULT 24
31: 0000000000004030 1 OBJECT LOCAL DEFAULT 24 completed.7392
57: 0000000000004038 0 NOTYPE GLOBAL DEFAULT 24 _end
59: 0000000000004034 4 OBJECT GLOBAL DEFAULT 24 a
60: 0000000000004030 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
__bss_start(0000000000004030) is the start of the .bss section, and _end(0000000000004038) is the end of the executable(and in this case also the end of the .bss section).
As the 4 bytes of a are at addresses 0000000000004034-0000000000004037, a is obviously in the .bss section.
And it does show in the .o, just not where you were looking for.
You can read the symbols in the .o file the same way and something like this will show up:
$ readelf comm.o -s
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS comm.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 6
9: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM a
10: 0000000000000000 36 FUNC GLOBAL DEFAULT 1 main
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf

Builtins in Clang not so builtin?

If I have the following in strlen.c:
int call_strlen(char *s) {
return __builtin_strlen(s);
}
And then compile it with both gcc and clang like this:
gcc -c -o strlen-gcc.o strlen.c
clang -c -o strlen-clang.o strlen.c
I am surprised to see that strlen-clang.o contains a reference to "strlen", whereas gcc has expectedly inlined the function and has no such reference. (see objdumps below). Is this a bug in clang? I have tested it in several versions of the clang compiler, including 3.8.
Edit: the reason this is important for me is that I'm linking with -nostdlib, and the clang-compiled version gives me a link error that strlen is not found.
Clang
#> objdump -d strlen-clang.o
strlen-clang.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <call_strlen>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 48 8b 7d f8 mov -0x8(%rbp),%rdi
10: e8 00 00 00 00 callq 15 <call_strlen+0x15>
15: 89 c1 mov %eax,%ecx
17: 89 c8 mov %ecx,%eax
19: 48 83 c4 10 add $0x10,%rsp
1d: 5d pop %rbp
1e: c3 retq
#> objdump -t strlen-clang.o
strlen-clang.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 strlen.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 g F .text 000000000000001f call_strlen
0000000000000000 *UND* 0000000000000000 strlen
GCC
#> objdump -d strlen-gcc.o
strlen-gcc.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <call_strlen>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 89 7d f8 mov %rdi,-0x8(%rbp)
8: 48 8b 45 f8 mov -0x8(%rbp),%rax
c: 48 c7 c1 ff ff ff ff mov $0xffffffffffffffff,%rcx
13: 48 89 c2 mov %rax,%rdx
16: b8 00 00 00 00 mov $0x0,%eax
1b: 48 89 d7 mov %rdx,%rdi
1e: f2 ae repnz scas %es:(%rdi),%al
20: 48 89 c8 mov %rcx,%rax
23: 48 f7 d0 not %rax
26: 48 83 e8 01 sub $0x1,%rax
2a: 5d pop %rbp
2b: c3 retq
#> objdump -t strlen-gcc.o
strlen-gcc.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 strlen.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text 000000000000002c call_strlen

Just to get optimisation out of the way:
With clang -O0:
t.o:
(__TEXT,__text) section
_call_strlen:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 subq $0x10, %rsp
0000000000000008 movq %rdi, -0x8(%rbp)
000000000000000c movq -0x8(%rbp), %rdi
0000000000000010 callq _strlen
0000000000000015 movl %eax, %ecx
0000000000000017 movl %ecx, %eax
0000000000000019 addq $0x10, %rsp
000000000000001d popq %rbp
000000000000001e retq
With clang -O3
t.o:
(__TEXT,__text) section
_call_strlen:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 popq %rbp
0000000000000005 jmp _strlen
Now, onto the problem:
The clang documentation claims that clang support all GCC-supported builtins.
However, the GCC documentation seems to treat builtin functions and the names of their library equivalents as synonyms:
Both forms have the same type (including prototype), the same address (when their address is taken), and the same meaning as the C library functions [...].
Also it does not guarantee a builtin function with a library equivalent (as is the case with strlen) to indeed get optimised:
Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
Further, the clang internals manual mentions __builtin_strlen only once:
__builtin_strlen and strlen: These are constant folded as integer constant expressions if the argument is a string literal.
Other than that they seem to make no promises.
Since in your case the argument to __builtin_strlen is not a string literal, and since the GCC documentation allows for calls to builtin functions to be converted to library function calls, clang's behaviour seems perfectly valid.
A "patch for review" on the clang developers mailing list also says:
[...] It will still fall back to runtime use of library strlen, if
compile-time evaluation is not possible/required [...].
That was in 2012, but the text indicates that at least back then, only compile-time evaluation was supported.
Now, I see two options:
If you only need to compile the program yourself and then use and/or distribute it, I suggest you simply use gcc.
If you need others to be able to compile your code under both gcc and clang, I suggest adding a C library as a dependency for static linking.
I strongly advise against rolling your own implementations of standard library functions, even for seemingly simple cases (if you disagree, try writing up your own strlen implementation, then compare it to the glibc one).

Neither GCC nor Clang promises to inline this builtin. You quoted some GCC documentation seeming to make such a promise:
...GCC built-in functions are always expanded inline...
but this is a sentence fragment pulled out of context. The complete sentence is
With the exception of built-ins that have library equivalents such as the standard C library functions discussed below, or that expand to library calls, GCC built-in functions are always expanded inline and thus do not have corresponding entry points and their address cannot be obtained.
__builtin_strlen has the library equivalent strlen, so this sentence makes no promises about whether it gets inlined.

gcc: segmentation fault when compiling with nostdlib

I'm experimenting with entry points and got a segfault.
prog.c:
int main() {
return 0;
}
compiling and linking with:
gcc -Wall prog.c -nostdlib -c -o prog.o
ld prog.o -e main -o prog.out
objdump:
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000000b 00000000004000b0 00000000004000b0 000000b0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .eh_frame 00000038 00000000004000c0 00000000004000c0 000000c0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .comment 0000001c 0000000000000000 0000000000000000 000000f8 2**0
CONTENTS, READONLY
Disassembly of section .text:
00000000004000b0 <main>:
4000b0: 55 push %rbp
4000b1: 48 89 e5 mov %rsp,%rbp
4000b4: b8 00 00 00 00 mov $0x0,%eax
4000b9: 5d pop %rbp
4000ba: c3 retq
Valgrind output:
Access not within mapped region at address 0x0

retq takes the return address from the top of the stack and executes from there... problem is that according to the way Linux executes binaries, the number of arguments is on the stack and execution is transferred to address 0x1 (if no arguments are given)
set some dummy arguments with gdb (set args x y z)
you can compile and link with debug info (-g) and then use gdb
set a breakpoint on the retq instruction (br *0x4000ba) and run the program
execute the final instruction and observe the SIGSEGV address correspond to the number of arguments + 1
the program should exit with a system call, not retq
see
http://eli.thegreenplace.net/2012/08/13/how-statically-linked-programs-run-on-linux/
for some useful background info

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight