Why Executable and Linkable Format(ELF) File contains set of sections? - c

These-days I'm referring File Handling System Calls in Linux.
Furthermore I understood ELF which is Executable and Linkable Format , contains set of sections.
Those are .bss , .data , .rodata , .text , .comment , and unknown
I referred Wikipedia and this Website to study
So I have below questions
why ELF file uses set of sections?
what is the task of each above section ?
what is the feasibility of this using set of sections ?

A good reference for the ELF file format is the Object Files chapter of the System V ABI. In particular, special sections describes the uses of most of the sections you're likely to encounter.
why ELF file uses set of sections?
An object file contains lots of different classes of data, and it makes sense to group similar data into sections, especially since some sections' contents can be read directly into a process's image when the OS execs the ELF file.
.bss contains uninitialized data, such as int a; declared at global level in a C program. Actually, it contains nothing except the size that needs to be allocated when the ELF file is loaded into a process, because all variables in bss are initialized to 0.
.data contains initialized data, such as int a = 1000; declared at global level in a C program.
.rodata contains read-only data, such as character string literals and global level variables declared as const in C. When the OS execs the ELF file, it will load this section into an area of memory that is read-only.
.text contains executable instructions. When the OS execs the ELF file, it will load this section into an area of memory that is read-only. Sometimes .text and .rodata wind up being loaded into the same area of a process's memory.
.comment typically contains the name and version of the compiler(s) used to generate the file.
Not all of the sections described in the documentation may be present in all ELF files; in particular, running the strip command on the ELF file will remove the .symtab and .debug sections.

Related

gcc: how to produce ELF where file size equals mem size for all LOAD segments without custom linker script?

I have to produce an ELF binary with gcc from a Hello World-program written in C, where the mem size equals the file size in all LOAD-segments of the ELF file. My experience says me, that I can prevent this if I move .bss into .data in a custom linker script. But in my case, I want to achieve this without a custom linker script.
Is there a way I can force all LOAD-segments to have the same file size as mem size with an option for GCC?
Background: I'm working on enabling Linux binaries on a custom OS. The ELF-Loader so far is pretty basic and testing/developing will be much simpler, if I just can map the ELF as it is (as long as all LOAD-segments are page-aligned)..
For completeness, I provide the solution that includes a dedicated linker script. The relevant excerpt is the following:
.data ALIGN(4K) :
{
*(.data .data.*)
/* Putting .bss into the .data segment simplifies loading an ELF file especially in kernel
scenarios. Some basic ELF loaders in OS dev space require MEMSIZE==FILESIZE for each
LOAD segment. The zeroed memory will land "as is" in the ELF and increase its size.
I'm not sure why but "*(COMMON)" must be specified as well so that the .bss section
actually lands in .data. But the GNU ld doc also does it like this:
https://sourceware.org/binutils/docs/ld/Input-Section-Common.html */
*(COMMON)
*(.bss .bss.*)
} : rw
It is important that the output section is not called ".bss" and that
the section contains more than just ".bss". Otherwise, the "FILESIZE != MEMSIZE" optimization is done where the ELF loader needs to provide zeroed memory.

What is the memory attribute 'p' in my linker file?

In GCC, the MEMORY command describes the location and size of blocks of memory in the target.
The command must be used this way.
MEMORY
{
name [(attr)] : ORIGIN = origin, LENGTH = len
...
}
Now, I have a linker file used by the linker (a GCC based linker for Infineon Tricore microcontrollers, tricore-ld) defining a RAM memory section this way:
MEMORY
{
ram (w!xp): org = 0x70000000, len = 32k
...
}
Could you explain what 'p' means in (w!xp)? What does 'p' mean in general?
Not a standard linker script, not unusual for a custom micro-controller target of course. Perhaps forked a long time ago. It however can be easily reverse-engineered, GCC has always used the ELF format for object files. Google "elf section attributes", out pops this hit, pretty helpful here.
So you got alloc, exec, write, progbits. Aha, p == progbits. So (w!xp) surely should be interpreted as "section is writable, not executable, initial data is stored in the executable image".
Nothing very special, that's the traditional .data section in a C program. Compare to .bss, not p.
Info added by OP:
From this presentation on the UNIX ELF Format:
PROGBITS: This holds program contents including code, data, and debugger information.
NOBITS: Like PROGBITS. However, it occupies no space.
SYMTAB and DYNSYM: These hold symbol table.
STRTAB: This is a string table, like the one used in a.out.
REL and RELA: These hold relocation information.
DYNAMIC and HASH: This holds information related to dynamic linking.

How to specify .bss section on a elf target?

I want to define a bss section for my elf. But when i google for it only below flags would be supported for ELF targets using .section directive.
a section is allocatable
e section is excluded from executable and shared library.
w section is writable
x section is executable
M section is mergeable
S section contains zero terminated strings
G section is a member of a section group
T section is used for thread-local-storage
? section is a member of the previously-current section's group, if any
Can anyone help me to specify bss section or else any alternative option.

Difference between Program header and Section Header in ELF

Q1 What is the difference between Program header and Section Header in ELF?
Q1.1 What is the difference between segment and a section?
I believe pheaders point to sections only.
Q2. What is the difference between File Header and Program Header?
As per GNU ld linker script, Using Id: The GNU Linker:
PHDRS
{
name type [ FILEHDR ] [ PHDRS ] [ AT ( address ) ]
[ FLAGS ( flags ) ] ;
}
You may use the FILEHDR and PHDRS keywords appear after the program header type to further
describe the contents of the segment. The FILEHDR keyword means that the segment should include
the ELF file header. The PHDRS keyword means that the segment should include the ELF program
headers themselves.
This is a bit confusing.
The Executable & Linkable Format wikipage has a nice picture explaining ELF, and the difference between its program header & sections header. See also elf(5)
The [initial] program header is defining segments (in the address space of a process running that ELF executable) projected in virtual memory (the executable point of view) at execve(2) time. The [final] sections header is defining sections (the linkable point of view, for ld(1) etc...). Each section belongs to a segment (and may, or not, be visible -i.e. mapped into memory- at execution time). The ELF file header tells where program header table & section header table are.
Use also objdump(1) and readelf(1) to explore several ELF files (executables, shared objects, linkable objects) existing on your Linux system.
Levine's Linkers & Loaders book has a chapter explaining that in details.
And Drepper's paper How to Write Shared Libraries also has some good explanation.
Q1 What is the difference between the Program header and the Section Header in ELF?
A program header describes a segment or other information that the system needs to prepare the program for execution.
A section is an interface that can represent a lot of things. Look here for details (search for Elf64_Shdr)
A section header is inside a segment.
Q1.1 What is the difference between a segment and a section?
A segment consists of one or more sections, though this fact is transparent to the program header.
Q2. What is the difference between File Header and Program Header?
The ELF file header. This appears at the start of every ELF file (see /usr/include/elf.h). It also has the number of program headers existing in this file.
The ELF file always starts with the ELF file header. And it references the program headers. You need at least one program header.

Linker scripts: strategies for debugging?

I'm trying to debug a linker problem that I have, when writing a kernel.
The issue is that I have a variable SCAN_CODE_MAPPING that I'm not able to use -- it appears to be empty or something. I can fix this by changing the way I link my program, but I don't know why.
When I look inside the generated binary file using objdump, the data for the variable is definitely there, so there's just something broken with the reference to it.
Here's a gist with both of the linker scripts and the part of the symbol table that's different between the two files.
What confuses me is that both of the symbol tables have all the same symbols, they're all the same length, and they appear to contain the right data. The only difference that I can see is that they're not in the same order.
So far I've tried
inspecting the SCAN_CODE_MAPPING memory location to make sure it has the data I expect and hasn't been zeroed out
checking that all the symbols are the same
checking that all the symbol contents are the same length
looking at .data.rel.ro.local to make sure it has the address of the data
One possible clue is this warning:
warning: uninitialized space declared in non-BSS section `.text': zeroing
which I get in both the broken and the correct case.
What should I try next?
The problem here turned out to be that I was writing an OS, and only 12k of it was being loaded instead of the whole thing. So the linker script was actually working fine.
The main tools I used to understand binaries were:
nm
objdump
readelf
You can get a ton more information using "readelf".
In particular, take a look at the program headers:
readelf -l program
Your BSS section is quite different than the standard one, which probably causing the warning. Here's what the default looks like on my system:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
. = ALIGN(. != 0 ? 64 / 8 : 1);
}
If an input section doesn't match anything in your linker script, the linker still has to place it somewhere. Make sure you're covering all the input sections.
Note that there is a difference between sections and segments. Sections are used by the linker, but the only thing the program loader looks at are the segments. The text segment includes the text section, but it also includes other sections. Sections that go into the same segment must be adjacent. So order does matter.
The rodata section usually goes after the text section. These are both read-only during execution and will show up once in your program headers as a LOAD entry with read & execute permissions. That LOAD entry is the text segment.
The bss section usually goes after the data section. These are both writable during execution and will show up once in your program headers as a LOAD entry with read & write permissions. That LOAD entry is the data segment.
If you change the order, it affects how the linker generates the program headers. The program headers, rather than the section headers, are used when loading your program prior to executing it. Make sure to check the program headers when using a custom linker script.
If you can give more specifics about what your actual symptoms are, then it'll be easier to help.

Resources