How to specify .bss section on a elf target? - c

I want to define a bss section for my elf. But when i google for it only below flags would be supported for ELF targets using .section directive.
a section is allocatable
e section is excluded from executable and shared library.
w section is writable
x section is executable
M section is mergeable
S section contains zero terminated strings
G section is a member of a section group
T section is used for thread-local-storage
? section is a member of the previously-current section's group, if any
Can anyone help me to specify bss section or else any alternative option.

Related

gcc: how to produce ELF where file size equals mem size for all LOAD segments without custom linker script?

I have to produce an ELF binary with gcc from a Hello World-program written in C, where the mem size equals the file size in all LOAD-segments of the ELF file. My experience says me, that I can prevent this if I move .bss into .data in a custom linker script. But in my case, I want to achieve this without a custom linker script.
Is there a way I can force all LOAD-segments to have the same file size as mem size with an option for GCC?
Background: I'm working on enabling Linux binaries on a custom OS. The ELF-Loader so far is pretty basic and testing/developing will be much simpler, if I just can map the ELF as it is (as long as all LOAD-segments are page-aligned)..
For completeness, I provide the solution that includes a dedicated linker script. The relevant excerpt is the following:
.data ALIGN(4K) :
{
*(.data .data.*)
/* Putting .bss into the .data segment simplifies loading an ELF file especially in kernel
scenarios. Some basic ELF loaders in OS dev space require MEMSIZE==FILESIZE for each
LOAD segment. The zeroed memory will land "as is" in the ELF and increase its size.
I'm not sure why but "*(COMMON)" must be specified as well so that the .bss section
actually lands in .data. But the GNU ld doc also does it like this:
https://sourceware.org/binutils/docs/ld/Input-Section-Common.html */
*(COMMON)
*(.bss .bss.*)
} : rw
It is important that the output section is not called ".bss" and that
the section contains more than just ".bss". Otherwise, the "FILESIZE != MEMSIZE" optimization is done where the ELF loader needs to provide zeroed memory.

ELF second load segment address of .data + .bss

In this case, is right that address of:
.data start at 0x08048054 up to 0x08048054+0x0000e
.bss start at 0x08048054+0x0000e up to 0x0804805+0x00016
or am I missing something? please clarify it for me.
EDIT
I used this command to get the information as in the image:
readelf -l filename
Ok, so where do I begin... Yes both .data and .bss are in that region in memory. The problem is that there is no way to figure out what order they are in.
We can assume that the default order is followed and make an educated guess but I don't like that.
Through the lengthy comment thread under the question you mentioned something interesting, that wasn't evident in your question.
the executable isn't dynamically linked as file command says: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped in this case, there's no a linker script, isn't? – The Mask
In this case the library contains the symbol table with all of the symbol offsets. This table includes section information. It will be processed by the linker when you compile your application. At that point it is your linker script that controls the order in which the .data and .bss sections are out put in.
If it is the default linker script, look it up. If it is custom, you should have access to it and can read it. If unsure elaborate here and we'll try and help :)
I myself have asked a question that is unrelated but offers example code of a linker script and some C code. In that linker script the .bss segment came after the .data segment.
You are looking at the program header information, whereas the section headers are probably what you need. There may be many sections contained within a program header and you cannot precisely infer the sizes and alignment requirements of the various sections.
To see the section headers, use:
readelf -S

Why Executable and Linkable Format(ELF) File contains set of sections?

These-days I'm referring File Handling System Calls in Linux.
Furthermore I understood ELF which is Executable and Linkable Format , contains set of sections.
Those are .bss , .data , .rodata , .text , .comment , and unknown
I referred Wikipedia and this Website to study
So I have below questions
why ELF file uses set of sections?
what is the task of each above section ?
what is the feasibility of this using set of sections ?
A good reference for the ELF file format is the Object Files chapter of the System V ABI. In particular, special sections describes the uses of most of the sections you're likely to encounter.
why ELF file uses set of sections?
An object file contains lots of different classes of data, and it makes sense to group similar data into sections, especially since some sections' contents can be read directly into a process's image when the OS execs the ELF file.
.bss contains uninitialized data, such as int a; declared at global level in a C program. Actually, it contains nothing except the size that needs to be allocated when the ELF file is loaded into a process, because all variables in bss are initialized to 0.
.data contains initialized data, such as int a = 1000; declared at global level in a C program.
.rodata contains read-only data, such as character string literals and global level variables declared as const in C. When the OS execs the ELF file, it will load this section into an area of memory that is read-only.
.text contains executable instructions. When the OS execs the ELF file, it will load this section into an area of memory that is read-only. Sometimes .text and .rodata wind up being loaded into the same area of a process's memory.
.comment typically contains the name and version of the compiler(s) used to generate the file.
Not all of the sections described in the documentation may be present in all ELF files; in particular, running the strip command on the ELF file will remove the .symtab and .debug sections.

Predefined ELF Code Sections

What are the predefined code sections that can be referenced in an ELF linker command file? In addition to any others that may be available, I am specifically wondering about these:
.text
.rodata
.sdata
.sbss
.bss
.data
Finding documentation has proven most difficult. If anyone can also tell me what the acronym ELF stands for in this context, that would be a plus. Thanks.
Not sure what you mean about not finding documentation. Wikipedia has a large collection of links about the Executable and Linkable Format. One of the links there describes the ELF sections you are interested in (plus lots of other stuff). Another link here describes additional ELF special sections (.sbss/.sdata).

How to get pointers and sizes of variables from the compiler - from outside the compiled code?

I'd like the compiler to output a file containing the pointers to all global variables in the source code it is compiling, and also the sizes of them.
Is this possible? Is there a way to do it in any c compiler?
Something like a map file? That will show where the globals and statics are allocated, but not what they point at. Most compilers (linkers) will output one automatically or with a simple statement. Just search for map file in your documentation.
This information is available in the symbol table of the binary, though it might not mean what you expect it to.
The compiler takes one or more source files, compiles the code to object code, and generates an object file (.o on Unix, .obj on Windows). All variables and functions referenced in the source file are mentioned in the symbol table. Variables and functions that are defined in the source file have specific addresses and sizes, while symbols not defined in the source file are marked as undefined and must be linked later. All symbols are listed relative to a particular section. Common sections are ".text" for executable code, ".bss" for variables that are initialized to zero when the program starts, and ".data" for variables initialized with non-zero values.
The linker takes one or more object files, combines the sections (putting all of code and data from each object file into one big section for code and data), and writes an output file. This output file may be an executable, or it may be a shared library. An executable on disk still doesn't have a pointer for each variable; it still stores the offset from the beginning of the section to the variable.
When an executable is run, the operating system's dynamic loader reads the executable, finds each section, and allocates memory for that section. (It may also set up different permissions on each section -- the ".text" segment is often marked as read-only, and (on processors that support it) data segments are sometimes marked as non-executable.) Only then does a variable get a pointer -- when the code needs to access a particular variable, it adds the address of the beginning of the section to the offset from the beginning of the section to get the pointer.
You can use various tools to investigate each binary's symbol table. The GNU toolchain's objdump (used on Linux) is one such tool.
For a simple C hello-world program:
#include <stdio.h>
const char message[] = "Hello world!\n";
int main(int argc, char ** argv) {
printf(message);
return 0;
}
I compile (but don't link) it on my Linux box:
$ gcc -c hello.c -o hello.o
Now I can look at the symbol table:
$ objdump -t hello.o
hello.o: file format elf32-i386
SYMBOL TABLE:
00000000 l df *ABS* 00000000 hello.c
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .rodata 00000000 .rodata
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
00000000 l d .comment 00000000 .comment
00000000 g O .rodata 0000000e message
00000000 g F .text 0000002b main
00000000 *UND* 00000000 puts
The first column is the address of each symbol, relative to the beginning of the section. Each symbol has various flags, and some of the symbols are used as hints to the rest of the toolchain and the debugger. (If I built with debugging symbols, I'd see many entries devoted to them as well.) My simple program has only one variable:
00000000 g O .rodata 0000000e message
The fifth column tells me the symbol message is size 0xe -- 14 bytes.
While no compiler is required to output this data, most linkers can dump out this information. For example, Microsoft's linker mapfile contains all the public symbols in an executable/dll as well as their address relative to the section (read only, read write, code, zero initialized, etc.) they are put in. Sizes can be derived from that, although it's mainly an approximation.
You can also probably figure out a way to inspect the debugging symbols generated for the executable, as that's exactly what a debugger has to do anyway.
Normally you'd get this from the linker, not the compiler -- the linker is what assigns addresses to things. Most linkers can produce a map file that will contain the addresses of global variables and functions (as well as any other symbols in the executable it creates). It'll be up to you to sort out which are which. All of them I've seen include something to tell you, but the exact format varies with the linker involved.

Resources