Adding section to GNU linker script - c

Hi I am trying to define a custom section in my linker script in a following way:
.version_section(__custom_data__) :
{
KEEP (*version_info.o (.rodata* .data* .sdata*))
}
I am compiling a C file that contains a structure and I want to that structure be stored in this version_section all time.
version_info ver_info __attribute__ ((section(".version_section"))) = {7, 10, 2013, 17, 17, "some_type", "some_sw_version", "some_version"} ;
Now, till this stage everything works fine. But the so generated section has flags "AW" however I need flags to be "A".
So I am using an assembler file that defined this section to have "A" flag like this:
.section .version_section,"a", #progbits
.align 8
.globl __custom_data__
.type __custom_data__, #function
__custom_data__:
.word 0
.size __custom_data__, .-__custom_data__
.space (0x1024-0x4), 0
But I still see the the default flags to the version_section, ie. AW in readelf
[11] .version_section PROGBITS 00011088 004088 001044 00 WA 0 0 8
What am I doing wrong here?

It appears that "W" meant writable in readelf output, as I suspected. Adding the const qualifier to the definition of ver_info moved it to the desired segment in memory.

Related

How can I access the size of a symbol as set by .size directive in C

I have set the size of a symbol in assembly using the .size directive of GNU assembler, how do I access this size in C?
void my_func(void){}
asm(
"_my_stub:\n\t"
"call my_func\n\t"
".size _my_stub, .-_my_stub"
);
extern char* _my_stub;
int main(){
/* print size of _my_stub here */
}
Here is relevant objdump
0000000000000007 <_my_stub>:
_my_stub():
7: e8 00 00 00 00 callq c <main>
Here is relevant portion of readelf
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
5: 0000000000000007 5 NOTYPE LOCAL DEFAULT 1 _my_stub
I can see from the objdump and symbol table that the size of _my_stub is 5. How can I get this value in C?
I don't know of a way to access the size attribute from within gas. As an alternative, how about replacing .size with
hhh:
.long .-_my_stub # 32-bit integer constant, regardless of the size of a C long
and
extern const uint32_t gfoo asm("hhh");
// asm("asm_name") sidesteps the _name vs. name issue across OSes
I can't try it here, but I expect you should be able to printf("%ld\n", gfoo);. I tried using .equ so this would be a constant rather than allocating memory for it, but I never got it to work right.
This does leave the question as to the purpose of the .size attribute. Why set it if you you can't read it? I'm not an expert, but I've got a theory:
According to the docs, for COFF outputs, .size must be within a .def/.endef. Looking at .def, we see that it's used to Begin defining debugging information for a symbol name.
While ELF doesn't have the same nesting requirement, it seems plausible to assume that debugging is the intent there too. If this is only intended to be used by debuggers, it (kinda) makes sense that there's no way to access it from within the assembler.
I guess you just want to get the size of a subset of a code segment or data segment. Here is an assembly example (GAS&AT style) you can refer to:
target_start:
// Put your code segment or data segment here
// ...
target_end = .
// Use .global to export the label
.global target_size
target_size = target_end - target_start
In C/C++ source file, you can use label target_size by extern long target_size.
Note: this example hasn't been tested.

How to load library defined symbols to a specified location?

The test is on Ubuntu 12.04, 32-bit, with gcc 4.6.3.
Basically I am doing some binary manipulation work on ELF binaries, and what I have to do now is to assemble a assembly program and guarantee the libc symbols are loaded to a predefined address by me.
Let me elaborate it in an simple example.
Suppose in the original code, libc symbols stdout#GLIBC_2.0 is used.
#include <stdio.h>
int main() {
FILE* fout = stdout;
fprintf( fout, "hello\n" );
}
When I compile it and check the symbol address using these commands:
gcc main.c
readelf -s a.out | grep stdout
I got this:
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout#GLIBC_2.0 (2)
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout##GLIBC_2.0
and the .bss section is like this:
readelf -S a.out | grep bss
[25] .bss NOBITS 0804a020 001014 00000c 00 WA 0 0 32
Now what I am trying to do is to load the stdout symbol in a predefined address, so I did this:
echo "stdout = 0x804a024;" > symbolfile
gcc -Wl,--just-symbols=symbolfile main.c
Then when I check the .bss section and symbol stdout, I got this:
[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4
4: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
49: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
It seems that I didn't successfully load the symbol stdout##GLIBC_2.0, but just a wired stdout. (I tried to write stdout##GLIBC_2.0 in symbolfile, but it can't compile... )
It seems that as I didn't make it, the beginning address of .bss section has also changed, which makes the address of stdout symbol in a non-section area. During runtime, it throws a segmentation fault when loading from 0x804a024.
Could anyone help me on how to successfully load the library symbol at a predefined address? Thanks!

NULL terminator on string included via AS's incbin directive

I have some large string resources located in files that I include in my executable. I include them in the executable using the following. The *.S allows GCC to invoke as to produce the object file without any special processing.
;; ca_conf.S
.section .rodata
;; OpenSSL's CA configuration
.global ca_conf
.type ca_conf, #object
.align 8
ca_conf:
ca_conf_start:
.incbin "res/openssl-ca.cnf"
ca_conf_end:
.byte 0
;; The string's size (if needed)
.global ca_conf_size
.type ca_conf_size, #object
.align 4
ca_conf_size:
.int ca_conf_end - ca_conf_start
I add a .byte 0 after including the string to ensure the string is NULL terminated. That allows me to use ca_conf as a C const char*, or {ca_conf,ca_conf_size} as a C++ string.
Will the assembler or linker rearrange things such that the NULL terminator could become separated from the string its terminating? Or will the assembler and linker always keep them together?
Because you're in assembler they will be kept together.
One other point, because of the ALIGN 4 ca_conf_size may not be the length you are expecting, it can include upto 3 padding bytes.

Local and static variables in C (cont'd)

Building on my last question i'm trying to figure out how .local and .comm directives work exactly and in particular how they affect linkage and duration in C.
So I've run the following experiment:
static int value;
which produces the following assembly code (using gcc):
.local value
.comm value,4,4
When initialized to zero yields the same assembly code (using gcc):
.local value
.comm value,4,4
This sounds logical because in both cases i would expect that the variable will be stored in the bss segment. Moreover, after investigating using ld --verbose it looks that all .comm variables are indeed placed in the bss segment:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
// ...
}
When i initialize however my variable to a value other than zero, the compiler defines the variable in the data segment as i would expected, but produces the following output:
.data
.align 4
.type value, #object
.size value, 4
value:
.long 1
Besides the different segments (bss and data respectively) which thanks to your help previously i now understand, my variable has been defined as .local and .comm in the first example but not in the second. Could anyone explain that difference between the two outputs produced from each case?
The .local directive marks a symbol as a local, non-externally-visible symbol, and creates it if it doesn't already exist. It's necessary for 0-initialized local symbols, because .comm declares but does not define symbols. For the 1-initialized variant, the symbol itself (value:) declares the symbol.
Using .local and .comm is essentially a bit of a hack (or at least a shorthand); the alternative would be to place the symbol into .bss explicitly:
.bss
.align 4
.type value, #object
.size value, 4
value:
.zero 4
Linux kernel zeros the virtual memory of a process after allocation due to security reasons. So, the compiler already knows that the memory will be filled with zeros and does an optimization: if some variable is initialized to 0, there's no need to keep space for it in a executable file (.data section actually takes some space in ELF executable, whereas .bss section stores only its length assuming that its initial contents will be zeros).

What does KEEP mean in a linker script?

The LD manual does not explain what the KEEP command does. Below is a snippet from a third-party linker script that features KEEP. What does the KEEP command do in ld?
SECTIONS
{
.text :
{
. = ALIGN(4);
_text = .;
PROVIDE(stext = .);
KEEP(*(.isr_vector))
KEEP(*(.init))
*(.text .text.*)
*(.rodata .rodata.*)
*(.gnu.linkonce.t.*)
*(.glue_7)
*(.glue_7t)
*(.gcc_except_table)
*(.gnu.linkonce.r.*)
. = ALIGN(4);
_etext = .;
_sidata = _etext;
PROVIDE(etext = .);
_fini = . ;
*(.fini)
} >flash
Afaik LD keeps the symbols in the section even if symbols are not referenced. (--gc-sections).
Usually used for sections that have some special meaning in the binary startup process, more or less to mark the roots of the dependency tree.
(For Sabuncu below)
Dependency tree:
If you eliminate unused code, you analyze the code and mark all reachable sections (code+global variables + constants).
So you pick a section, mark it as "used" and see what other (unused) section it references, then you mark those section as "used", and check what they reference etc.
The section that are not marked "used" are then redundant, and can be eliminated.
Since a section can reference multiple other sections (e.g. one procedure calling three different other ones), if you would draw the result you get a tree.
Roots:
The above principle however leaves us with a problem: what is the "first" section that is always used? The first node (root) of the tree so to speak? This is what "keep()" does, it tells the linker which sections (if available) are the first ones to look at. As a consequence these are always linked in.
Typically these are sections that are called from the program loader to perform tasks related to dynamic linking (can be optional, and OS/fileformat dependent), and the entry point of the program.
Minimal Linux IA-32 example that illustrates its usage
main.S
.section .text
.global _start
_start:
/* Dummy access so that after will be referenced and kept. */
mov after, %eax
/*mov keep, %eax*/
/* Exit system call. */
mov $1, %eax
/* Take the exit status 4 bytes after before. */
mov $4, %ebx
mov before(%ebx), %ebx
int $0x80
.section .before
before: .long 0
/* TODO why is the `"a"` required? */
.section .keep, "a"
keep: .long 1
.section .after
after: .long 2
link.ld
ENTRY(_start)
SECTIONS
{
. = 0x400000;
.text :
{
*(.text)
*(.before)
KEEP(*(.keep));
*(.keep)
*(.after)
}
}
Compile and run:
as --32 -o main.o main.S
ld --gc-sections -m elf_i386 -o main.out -T link.ld main.o
./main.out
echo $?
Output:
1
If we comment out the KEEP line the output is:
2
If we either:
add a dummy mov keep, %eax
remove --gc-sections
The output goes back to 1.
Tested on Ubuntu 14.04, Binutils 2.25.
Explanation
There is no reference to the symbol keep, and therefore its containing section .keep.
Therefore if garbage collection is enabled and we don't use KEEP to make an exception, that section will not be put in the executable.
Since we are adding 4 to the address of before, if the keep section is not present, then the exit status will be 2, which is present on the next .after section.
TODO: nothing happens if we remove the "a" from .keep, which makes it allocatable. I don't understand why that is so: that section will be put inside the .text segment, which because of it's magic name will be allocatable.
Force the linker to keep some specific sections
SECTIONS
{
....
....
*(.rodata .rodata.*)
KEEP(*(SORT(.scattered_array*)));
}

Resources