Keep rodata located with the function that created it

Keep rodata located with the function that created it - c

I'm trying to make .rodata section location stay with its associated function memory location. I'm using the GNU compiler/linker, bare metal, plain-jane c, with an STM32L4A6 micro-controller.
I have a custom board using an STM32L4A6 controller with 1Meg of Flash divided into 512 - 2K pages. Each page can be individually erased and programmed from a function running in RAM. I'd like to take advantage of this fine-grained flash organization to create an embedded firmware application that can be updated on-the-fly by modifying or adding individual functions in the code. My scheme is to dedicate a separate page of flash for each function that might ever need to be changed or created. It's extremely wasteful of flash but I'll never use more than ~10% of it so I can afford to to be wasteful. I've made some progress on this and can now make significant changes to the operation of my application by uploading very small bits of binary code. These "Patches" often do not even require a system reboot.
The problem I'm having is that when a function contains any sort of constant data, such as a literal string, it winds up in the .rodata section. I need the rodata for a given function to stay in the same area as the function that created it. Does anyone know how I might be able to force the .rodata that is created in a function to stay attached to that same function in flash? Like maybe the .rodata from that function could be positioned immediately following the function itself? Maybe I need to use -ffunction-sections or something like that? I've been through the various linker manuals but still can't figure how to do this. Below is the start of my linker script. I don't know how to include function .rodata in the individual page sections.
Example function:
#define P018 __attribute__((long_call, section(".txt018")))
P018 int Function18(int A, int B){int C = A*B; return C;}
A better example that shows my problem would be the following:
#define P152 __attribute__((long_call, section(".txt152")))
P152 void TestFunc(int A){printf("%d Squared Is: %d\r\n",A,A*A);}
In this case, the binary equivalent of "%d Squared Is: %d\r\n" can be found in .rodata with all of the other literal strings in my program. I would prefer it to be located in section .txt152 .
Linker Script snippet (Mostly generated from a simple console program.)
MEMORY
{
p000 (rx) : ORIGIN = 0x08000000, LENGTH = 0x8000
p016 (rx) : ORIGIN = 0x08008000, LENGTH = 0x800
p017 (rx) : ORIGIN = 0x08008800, LENGTH = 0x800
p018 (rx) : ORIGIN = 0x08009000, LENGTH = 0x800
.
.
.
p509 (rx) : ORIGIN = 0x080fe800, LENGTH = 0x800
p510 (rx) : ORIGIN = 0x080ff000, LENGTH = 0x800
p511 (rx) : ORIGIN = 0x080ff800, LENGTH = 0x800
ram (rwx) : ORIGIN = 0x20000000, LENGTH = 256K
ram2 (rw) : ORIGIN = 0x10000000, LENGTH = 64K
}
SECTIONS
{
.vectors :
{
KEEP(*(.isr_vector .isr_vector.*))
} > p000
.txt016 : { *(.txt016) } > p016 /* first usable 2k page following 32k p000 */
.txt017 : { *(.txt017) } > p017
.txt018 : { *(.txt018) } > p018
.
.
.
.txt509 : { *(.txt509) } > p509
.txt510 : { *(.txt510) } > p510
.txt511 : { *(.txt511) } > p511
.text :
{
*(.text .text.* .gnu.linkonce.t.*)
*(.glue_7t) *(.glue_7)
*(.rodata .rodata* .gnu.linkonce.r.*)
} > p000
.
.
.
In case anyone's interested, here's my RAM code for doing the erase/program operation
__attribute__((long_call, section(".data")))
void CopyPatch
(
unsigned short Page,
unsigned int NumberOfBytesToFlash,
unsigned char *PatchBuf
)
{
unsigned int i;
unsigned long long int *Flash;
__ASM volatile ("cpsid i" : : : "memory"); //disable interrupts
Flash = (unsigned long long int *)(FLASH_BASE + Page*2048); //set flash memory pointer to Page address
GPIOE->BSRR = GPIO_BSRR_BS_1; //make PE1(LED) high
FLASH->KEYR = 0x45670123; //unlock the flash
FLASH->KEYR = 0xCDEF89AB; //unlock the flash
while(FLASH->SR & FLASH_SR_BSY){} //wait while flash memory operation is in progress
FLASH->CR = FLASH_CR_PER | (Page << 3); //set Page erase bit and the Page to erase
FLASH->CR |= FLASH_CR_STRT; //start erase of Page
while(FLASH->SR & FLASH_SR_BSY){} //wait while Flash memory operation is in progress
FLASH->CR = FLASH_CR_PG; //set flash programming bit
for(i=0;i<(NumberOfBytesToFlash/8+1);i++)
{
Flash[i] = ((unsigned long long int *)PatchBuf)[i]; //copy RAM to FLASH, 8 bytes at a time
while(FLASH->SR & FLASH_SR_BSY){} //wait while flash memory operation is in progress
}
FLASH->CR = FLASH_CR_LOCK; //lock the flash
GPIOE->BSRR = GPIO_BSRR_BR_1; //make PE1(LED) low
__ASM volatile ("cpsie i" : : : "memory"); //enable interrupts
}

Okay ... Sorry for the delay, but I had to think about this a bit ...
I'm not sure you can do this [completely] with a linker script alone. It might be possible, but I think there's an easier/surer way [with a bit of extra prep]
A method I've used before is to compile with -S to get a .s file. Change/mangle that. And, then, compile the modified .s
Note that you may have some trouble with a global like:
int B;
This will go to a .comm section in the asm source. This may not be ideal.
For initialized data:
int B = 23;
You may want to add a section attribute to force it to a special section. Otherwise, it will end up in a .data section
So, I might avoid .comm and/or .bss sections in favor of always using initialized data. That's because .comm has the same issue as .rodata (i.e. it ends up as one big blob).
Anyway, below is a step by step process.
I put the section name macros in a common file (e.g.) sctname.h:
#define _SCTJOIN(_pre,_sct) _pre #_sct
#define _TXTSCT(_sct) __attribute__((section(_SCTJOIN(".txt",_sct))))
#define _DATSCT(_sct) __attribute__((section(_SCTJOIN(".dat",_sct))))
#ifdef SCTNO
#define TXTSCT _TXTSCT(SCTNO)
#define DATSCT _DATSCT(SCTNO)
#endif
Here's a slightly modified version of your .c file (e.g. module.c):
#include <stdio.h>
#ifndef SCTNO
#define SCTNO 152
#endif
#include "sctname.h"
int B DATSCT = 23;
TXTSCT void
TestFunc(int A)
{
printf("%d Squared Is: %d\r\n", A, A * A * B);
}
To create the .s file, we do:
cc -S -Wall -Werror -O2 module.c
The actual section name/number can be specified on the command line:
cc -S -Wall -Werror -O2 -DSCTNO=152 module.c
This gives us a module.s:
.file "module.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%d Squared Is: %d\r\n"
.section .txt152,"ax",#progbits
.p2align 4,,15
.globl TestFunc
.type TestFunc, #function
TestFunc:
.LFB11:
.cfi_startproc
movl %edi, %edx
movl %edi, %esi
xorl %eax, %eax
imull %edi, %edx
movl $.LC0, %edi
imull B(%rip), %edx
jmp printf
.cfi_endproc
.LFE11:
.size TestFunc, .-TestFunc
.globl B
.section .dat152,"aw"
.align 4
.type B, #object
.size B, 4
B:
.long 23
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",#progbits
Now, we have to read in the .s and modify it. I've created a perl script that does this (e.g. rofix):
#!/usr/bin/perl
master(#ARGV);
exit(0);
sub master
{
my(#argv) = #_;
$root = shift(#argv);
$root =~ s/[.][^.]+$//;
$sfile = "$root.s";
$ofile = "$root.TMP";
open($xfsrc,"<$sfile") or
die("rofix: unable to open '$sfile' -- $!\n");
open($xfdst,">$ofile") or
die("rofix: unable to open '$sfile' -- $!\n");
$txtpre = "^[.]txt";
$datpre = "^[.]dat";
# find the text and data sections
seek($xfsrc,0,0);
while ($bf = <$xfsrc>) {
chomp($bf);
if ($bf =~ /^\s*[.]section\s(\S+)/) {
$sctcur = $1;
sctget($txtpre);
sctget($datpre);
}
}
# modify the data sections
seek($xfsrc,0,0);
while ($bf = <$xfsrc>) {
chomp($bf);
if ($bf =~ /^\s*[.]section\s(\S+)/) {
$sctcur = $1;
sctfix();
print($xfdst $bf,"\n");
next;
}
print($xfdst $bf,"\n");
}
close($xfsrc);
close($xfdst);
system("diff -u $sfile $ofile");
rename($ofile,$sfile) or
die("rofix: unable to rename '$ofile' to '$sfile' -- $!\n");
}
sub sctget
{
my($pre) = #_;
my($sctname,#sct);
{
last unless (defined($pre));
#sct = split(",",$sctcur);
$sctname = shift(#sct);
last unless ($sctname =~ /$pre/);
printf("sctget: FOUND %s\n",$sctname);
$sct_lookup{$pre} = $sctname;
}
}
sub sctfix
{
my($sctname,#sct);
my($sctnew);
{
last unless ($sctcur =~ /^[.]rodata/);
$sctnew = $sct_lookup{$txtpre};
last unless (defined($sctnew));
#sct = split(",",$sctcur);
$sctname = shift(#sct);
$sctname .= $sctnew;
unshift(#sct,$sctname);
$sctname = join(",",#sct);
$bf = sprintf("\t.section\t%s",$sctname);
}
}
The difference between the old and new module.s is:
sctget: FOUND .txt152
sctget: FOUND .dat152
--- module.s 2020-04-20 19:02:23.777302484 -0400
+++ module.TMP 2020-04-20 19:06:33.631926065 -0400
## -1,6 +1,6 ##
.file "module.c"
.text
- .section .rodata.str1.1,"aMS",#progbits,1
+ .section .rodata.txt152,"aMS",#progbits,1
.LC0:
.string "%d Squared Is: %d\r\n"
.section .txt152,"ax",#progbits
So, now, create the .o with:
cc -c module.s
For a makefile, it might be something like [with some wildcards]:
module.o: module.c
cc -S -Wall -Werror -O2 module.c
./rofix module.s
cc -c module.s
Now, you can add appropriate placements in your linker script for [your original section] .txt152 and the new .rodata.txt152.
And, the initialized data section .dat152
Note that the actual naming conventions are arbitrary. If you want to change them, just modify rofix [and the linker script] appropriately
Here's the readelf -a output for module.o:
Note that there's also a .rela.txt152 section!?!?
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 808 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 15
Section header string table index: 14
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000000 0000000000000000 AX 0 0 1
[ 2] .data PROGBITS 0000000000000000 00000040
0000000000000000 0000000000000000 WA 0 0 1
[ 3] .bss NOBITS 0000000000000000 00000040
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .rodata.txt152 PROGBITS 0000000000000000 00000040
0000000000000014 0000000000000001 AMS 0 0 1
[ 5] .txt152 PROGBITS 0000000000000000 00000060
000000000000001a 0000000000000000 AX 0 0 16
[ 6] .rela.txt152 RELA 0000000000000000 00000250
0000000000000048 0000000000000018 I 12 5 8
[ 7] .dat152 PROGBITS 0000000000000000 0000007c
0000000000000004 0000000000000000 WA 0 0 4
[ 8] .comment PROGBITS 0000000000000000 00000080
000000000000002d 0000000000000001 MS 0 0 1
[ 9] .note.GNU-stack PROGBITS 0000000000000000 000000ad
0000000000000000 0000000000000000 0 0 1
[10] .eh_frame PROGBITS 0000000000000000 000000b0
0000000000000030 0000000000000000 A 0 0 8
[11] .rela.eh_frame RELA 0000000000000000 00000298
0000000000000018 0000000000000018 I 12 10 8
[12] .symtab SYMTAB 0000000000000000 000000e0
0000000000000150 0000000000000018 13 11 8
[13] .strtab STRTAB 0000000000000000 00000230
000000000000001c 0000000000000000 0 0 1
[14] .shstrtab STRTAB 0000000000000000 000002b0
0000000000000078 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
There is no dynamic section in this file.
Relocation section '.rela.txt152' at offset 0x250 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000a 00050000000a R_X86_64_32 0000000000000000 .rodata.txt152 + 0
000000000011 000c00000002 R_X86_64_PC32 0000000000000000 B - 4
000000000016 000d00000004 R_X86_64_PLT32 0000000000000000 printf - 4
Relocation section '.rela.eh_frame' at offset 0x298 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000600000002 R_X86_64_PC32 0000000000000000 .txt152 + 0
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS module.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 4
6: 0000000000000000 0 SECTION LOCAL DEFAULT 5
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 9
9: 0000000000000000 0 SECTION LOCAL DEFAULT 10
10: 0000000000000000 0 SECTION LOCAL DEFAULT 8
11: 0000000000000000 26 FUNC GLOBAL DEFAULT 5 TestFunc
12: 0000000000000000 4 OBJECT GLOBAL DEFAULT 7 B
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
No version information found in this file.

Related

How Does C Compiled to ASM Know Where to Branch to for External Functions?

How does C compiled to ARM ASM know where to branch to for external functions?
For example, here is a simple C program:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
and its corresponding ARM ASM program:
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "Hello World!\000"
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.size main, .-main
.ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
.section .note.GNU-stack,"",%progbits
I dont see a "printf" tag anywhere so i am assuming that it links outside of the program. but how does it know where to search? it wouldnt look everywhere, because there might be duplicate tags, but there are also libraries that are placed (in the computers perspective) at random, though i also dont see anywhere where it defines a library location.
so where does it link, for more than just the standard C library?
and how can i compile it to not rely on those external dependencies?
or know where the libraries are so i know which files i can delete?
i am currently operating linux on a raspberry pi 400

My computer has an x86_64 processor, but the principle is the same. I'm using gcc 9.3.0.
I copied your code into a file called main.c and compiled it to assembly with gcc -S main.c. It produced the file main.s with the following contents:
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello World!"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
There are a lot of assembler directives here that can make it confusing to read, so I assembled it into an object file (gcc -c main.s) and then ran objdump -d main.o to disassemble it. Here is the output of the disassembly:
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # f <main+0xf>
f: b8 00 00 00 00 mov $0x0,%eax
14: e8 00 00 00 00 callq 19 <main+0x19>
19: b8 00 00 00 00 mov $0x0,%eax
1e: 5d pop %rbp
1f: c3 retq
The first three instructions here are boilerplate, so we'll ignore them. The first interesting instruction is
lea 0x0(%rip),%rdi
This is meant to load the address of the "Hello World!" string into register %rdi. Confusingly, it appears to simply be copying %rip into %rdi.
The next instruction puts a 0 into register %eax. I actually don't know why this is, but it's not really relevant to this discussion.
Then comes the actual call to printf:
callq 19 <main+0x19>
Once again, this uses an address that doesn't seem correct. You may notice that address 0x19 actually points to the next instruction.
The next 3 instructions basically perform the final return 0.
To really answer your question we need to look at more than just assembly code. At this point I would recommend taking some time to research the format of ELF files. I would consider that topic to be beyond the scope of this answer, but it will help you understand what I'm about to show you.
I first want to point out that in both your assembly and mine, the "Hello World!" string is preceded by this directive:
.section .rodata
whereas the main function is preceded by
.text
which is shorthand for
.section .text
These directives instruct the assembler on how to arrange the code and data in the object file. You can see this by printing the section headers of the object file:
$ readelf -S main.o
There are 14 section headers, starting at offset 0x318:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000020 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000258
0000000000000030 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000060
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000060
000000000000000d 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 0000006d
000000000000002b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 00000098
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.propert NOTE 0000000000000000 00000098
0000000000000020 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000b8
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000288
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f0
0000000000000138 0000000000000018 12 10 8
[12] .strtab STRTAB 0000000000000000 00000228
000000000000002a 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 000002a0
0000000000000074 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
If you can figure out how to read this output, you will see that the .text section is 0x20 bytes in size (which matches the above disassembly output), and the .rodata section is 0xd (13) bytes in size (i.e. strlen("Hello World!") plus a null byte). The answer your question, however, is in the relocation data:
$ readelf -r main.o
Relocation section '.rela.text' at offset 0x258 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000015 000c00000004 R_X86_64_PLT32 0000000000000000 printf - 4
Relocation section '.rela.eh_frame' at offset 0x288 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
This output is also very confusing to read if you don't know what it means. The first thing to understand is that the relocation sections tell the linker about places in the code that depend on symbols, either in other sections of the same file, or, more frequently, symbols that are defined in other files. The .rela.text section, for example, contains relocation information about the .text section. When this object file is linked into the final executable, the linker will overwrite part of the .text section with the missing addresses.
So, looking at the first entry under .rela.text, we see an offset of 0xb. Looking at the disassembly, we can see that offset 0xb references the fourth byte of the lea instruction's 7-byte encoding. The type, R_X86_64_PC32, tells us that that instruction is expecting a 32-bit PC-relative address, so we can expect the linker to overwrite the next 4 bytes (currently all 0). The rightmost column tells us, in human readable format, that this address needs to be populated with the address of the .rodata section minus 4 (with PC-relative addressing you have to remember that the PC will be pointing at the next instruction). It leaves out the fact, implicit for relocation type R_X86_64_PC32, that it will then subtract from that the final address of byte 0xb in the .text section, which will make that a valid PC-relative pointer to the "Hello World!" string data.
Similarly, the second entry tells the linker to copy the address of printf (minus 4) to offset 0x15 in the .text section, which would be the last 4 bytes of the callq instruction encoding. In this case, the type is R_X86_64_PLT32, which tells us that it's pointing to an entry in the procedure linkage table (PLT). A PLT is used for dynamic linking so that shared object libraries (in this case libc.so) can be loaded into physical memory once and shared by many running executables.
As a note, that might answer some of your specific questions, your compiler automatically links all the runtime libraries needed to execute a program. This includes any standard library functions, which would be part of libc.so. The only way to run without "external dependencies" would be to run on a bare-metal system (i.e. one without an operating system). Any operating system you use will have to do some amount of work to get your program to the start of main().

Will an executable access shared-libraries' global variable via GOT?

I was learning dynamic linking recently and gave it a try:
dynamic.c
int global_variable = 10;
int XOR(int a) {
return global_variable;
}
test.c
#include <stdio.h>
extern int global_variable;
extern int XOR(int);
int main() {
global_variable = 3;
printf("%d\n", XOR(0x10));
}
The compiling commands are:
clang -shared -fPIC -o dynamic.so dynamic.c
clang -o test test.c dynamic.so
I was expecting that in executable test the main function will access global_variable via GOT. However, on the contrary, the global_variable is placed in test's data section and XOR in dynamic.so access the global_variable indirectly.
Could anyone tell me why the compiler didn't ask the test to access global_variable via GOT, but asked the shared object file to do so?

Part of the point of a shared library is that one copy gets loaded into memory, and multiple processes can access that one copy. But every program has its own copy of each of the library's variables. If they were accessed relative to the library's GOT then those would instead be shared among the processes using the library, just like the functions are.
There are other possibilities, but it is clean and consistent for each executable to provide for itself all the variables it needs. That requires the library functions to access all of its variables with static storage duration (not just external ones) indirectly, relative to the program. This is ordinary dynamic linking, just going the opposite direction from what you usually think of.

Turns out my clang produced PIC by default so it messed with results.
I will leave updated answer here, and the original can be read below it.
After digging a bit more into the topic i have noticed that compilation of test.c does not generate a .got section by itself. You can check it by compiling the executable into an object file and omitting the linking step for now (-c option):
clang -c -o test.o test.c
If you inspect the sections of resulting object file with readelf -S you will notice that there is no .got in there:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000035 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000210
0000000000000060 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000075
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000075
0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 00000079
0000000000000013 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 0000008c
0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.pr[...] NOTE 0000000000000000 00000090
0000000000000030 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000c0
0000000000000038 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 00000270
0000000000000018 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 000000f8
00000000000000d8 0000000000000018 12 4 8
[12] .strtab STRTAB 0000000000000000 000001d0
000000000000003e 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00000288
0000000000000074 0000000000000000 0 0 1
This means that the entirety of .got section present in the test executable actually comes from dynamic.so, as it is PIC and uses GOT.
Would it be possible to compile dynamic.so as non-PIC as well? Turns out it apparently used to be 10 years ago (the article compiles examples to 32-bits, they dont have to work on 64 bits!). Linked article describes how a non-PIC shared library was relocated at load time - basically, every time an address that needed to be relocated after loading was present in machine code, it was instead set to zeroes and a relocation of a certain type was set in the library. During loading of the library the loader filled the zeros with actual runtime address of data/code that was needed. It is important to note that it cannot be applied in your though as 64-bit shared libraries cannot be made out of non-PIC (Source).
If you compile dynamic.so as a shared 32-bit library instead and do not use the -fPIC option (you usually need special repositories enabled to compile 32-bit code and have 32-bit libc installed):
gcc -m32 dynamic.c -shared -o dynamic.so
You will notice that:
// readelf -s dynamic.so
(... lots of output)
27: 00004010 4 OBJECT GLOBAL DEFAULT 19 global_variable
// readelf -S dynamic.so
(... lots of output)
[17] .got PROGBITS 00003ff0 002ff0 000010 04 WA 0 0 4
[18] .got.plt PROGBITS 00004000 003000 00000c 04 WA 0 0 4
[19] .data PROGBITS 0000400c 00300c 000008 00 WA 0 0 4
[20] .bss NOBITS 00004014 003014 000004 00 WA 0 0 1
global_variable is at offset 0x4010 which is inside .data section. Also, while .got is present (at offset 0x3ff0), it only contains relocations coming from other sources than your code:
// readelf -r
Offset Info Type Sym.Value Sym. Name
00003f28 00000008 R_386_RELATIVE
00003f2c 00000008 R_386_RELATIVE
0000400c 00000008 R_386_RELATIVE
00003ff0 00000106 R_386_GLOB_DAT 00000000 _ITM_deregisterTM[...]
00003ff4 00000206 R_386_GLOB_DAT 00000000 __cxa_finalize#GLIBC_2.1.3
00003ff8 00000306 R_386_GLOB_DAT 00000000 __gmon_start__
00003ffc 00000406 R_386_GLOB_DAT 00000000 _ITM_registerTMCl[...]
This article introduces GOT as part of introduction on PIC, and i have found that to be the case in plenty of places, which would suggest that indeed GOT is only used by PIC code although i am not 100% sure of it and i recommend researching the topic more.
What does this mean for you? A section in the first article i linked called "Extra credit #2" contains an explanation for a similar scenario. Although it is 10 years old, uses 32-bit code and the shared library is non-PIC it shares some similarities with your situation and might explain the problem you presented in your question.
Also keep in mind that (although similar) -fPIE and -fPIC are two separate options with slightly different effects and that if your executable during inspection is not loaded at 0x400000 then it probably is compiled as PIE without your knowledge which might also have impact on results. In the end it all boils down to what data is to be shared between processes, what data/code can be loaded at arbitrary address, what has to be loaded at fixed address etc. Hope this helps.
Also two other answers on Stack Overflow which seem relevant to me: here and here. Both the answers and comments.
Original answer:
I tried reproducing your problem with exactly the same code and compilation commands as the ones you provided, but it seems like both main and XOR use the GOT to access the global_variable. I will answer by providing example output of commands that i used to inspect the data flow. If your outputs differ from mine, it means there is some other difference between our environments (i mean a big difference, if only addresses/values are different then its ok). Best way to find that difference is for you to provide commands you originally used as well as their output.
First step is to check what address is accessed whenever a write or read to global_variable happens. For that we can use objdump -D -j .text test command to disassemble the code and look at the main function:
0000000000001150 <main>:
1150: 55 push %rbp
1151: 48 89 e5 mov %rsp,%rbp
1154: 48 8b 05 8d 2e 00 00 mov 0x2e8d(%rip),%rax # 3fe8 <global_variable>
115b: c7 00 03 00 00 00 movl $0x3,(%rax)
1161: bf 10 00 00 00 mov $0x10,%edi
1166: e8 d5 fe ff ff call 1040 <XOR#plt>
116b: 89 c6 mov %eax,%esi
116d: 48 8d 3d 90 0e 00 00 lea 0xe90(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
1174: b0 00 mov $0x0,%al
1176: e8 b5 fe ff ff call 1030 <printf#plt>
117b: 31 c0 xor %eax,%eax
117d: 5d pop %rbp
117e: c3 ret
117f: 90 nop
Numbers in the first column are not absolute addresses - instead they are offsets relative to the base address at which the executable will be loaded. For the sake of explanation i will refer to them as "offsets".
The assembly at offset 0x115b and 0x1161 comes directly from the line global_variable = 3; in your code. To confirm that, you could compile the program with -g for debug symbols and invoke objdump with -S. This will display source code above corresponding assembly.
We will focus on what these two instructions are doing. First instruction is a mov of 8 bytes from a location in memory to the rax register. The location in memory is given as relative to the current rip value, offset by a constant 0x2e8d. Objdump already calculated the value for us, and it is equal to 0x3fe8. So this will take 8 bytes present in memory at the 0x3fe8 offset and store them in the rax register.
Next instruction is again a mov, the suffix l tells us that data size is 4 bytes this time. It stores a 4 byte integer with value equal to 0x3 in the location pointed to by the current value of rax (not in the rax itself! brackets around a register such as those in (%rax) signify that the location in the instruction is not the register itself, but rather where its contents are pointing to!).
To summarize, we read a pointer to a 4 byte variable from a certain location at offset 0x3fe8 and later store an immediate value of 0x3 at the location specified by said pointer. Now the question is: where does that offset of 0x3fe8 come from?
It actually comes from GOT. To show the contents of the .got section we can use the objdump -s -j .got test command. -s means we want to focus on actual raw contents of the section, without any disassembling. The output in my case is:
test: file format elf64-x86-64
Contents of section .got:
3fd0 00000000 00000000 00000000 00000000 ................
3fe0 00000000 00000000 00000000 00000000 ................
3ff0 00000000 00000000 00000000 00000000 ................
The whole section is obviously set to zero, as GOT is populated with data after loading the program into memory, but what is important is the address range. We can see that .got starts at 0x3fd0 offset and ends at 0x3ff0. This means it also includes the 0x3fe8 offset - which means the location of global_variable is indeed stored in GOT.
Another way of finding this information is to use readelf -S test to show sections of the executable file and scroll down to the .got section:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
(...lots of sections...)
[22] .got PROGBITS 0000000000003fd0 00002fd0
0000000000000030 0000000000000008 WA 0 0 8
Looking at the Address and Size columns, we can see that the section is loaded at offset 0x3fd0 in memory and its size is 0x30 - which corresponds to what objdump displayed. Note that in readelf ouput "Offset" is actually the offset into the file form which the program is loaded - not the offset in memory that we are interested in.
by issuing the same commands on the dynamic.so library we get similar results:
00000000000010f0 <XOR>:
10f0: 55 push %rbp
10f1: 48 89 e5 mov %rsp,%rbp
10f4: 89 7d fc mov %edi,-0x4(%rbp)
10f7: 48 8b 05 ea 2e 00 00 mov 0x2eea(%rip),%rax # 3fe8 <global_variable##Base-0x38>
10fe: 8b 00 mov (%rax),%eax
1100: 5d pop %rbp
1101: c3 ret
So we see that both main and XOR use GOT to find the location of global_variable.
As for the location of global_variable we need to run the program to populate GOT. For that we can use GDB. We can run our program in GDB by invoking it this way:
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:." gdb ./test
LD_LIBRARY_PATH environment variable tells linker where to look for shared objects, so we extend it to include the current directory "." so that it may find dynamic.so.
After the GDB loads our code, we may invoke break main to set up a breakpoint at main and run to run the program. The program execution should pause at the beginning of the main function, giving us a view into our executable after it was fully loaded into memory, with GOT populated.
Running disassemble main in this state will show us the actual absolute offsets into memory:
Dump of assembler code for function main:
0x0000555555555150 <+0>: push %rbp
0x0000555555555151 <+1>: mov %rsp,%rbp
=> 0x0000555555555154 <+4>: mov 0x2e8d(%rip),%rax # 0x555555557fe8
0x000055555555515b <+11>: movl $0x3,(%rax)
0x0000555555555161 <+17>: mov $0x10,%edi
0x0000555555555166 <+22>: call 0x555555555040 <XOR#plt>
0x000055555555516b <+27>: mov %eax,%esi
0x000055555555516d <+29>: lea 0xe90(%rip),%rdi # 0x555555556004
0x0000555555555174 <+36>: mov $0x0,%al
0x0000555555555176 <+38>: call 0x555555555030 <printf#plt>
0x000055555555517b <+43>: xor %eax,%eax
0x000055555555517d <+45>: pop %rbp
0x000055555555517e <+46>: ret
End of assembler dump.
(gdb)
Our 0x3fe8 offset has turned into an absolute address of equal to 0x555555557fe8. We may again check that this location comes from the .got section by issuing maintenance info sections inside GDB, which will list a long list of sections and their memory mappings. For me .got is placed in this address range:
[21] 0x555555557fd0->0x555555558000 at 0x00002fd0: .got ALLOC LOAD DATA HAS_CONTENTS
Which contains 0x555555557fe8.
To finally inspect the address of global_variable itself we may examine the contents of that memory by issuing x/xag 0x555555557fe8. Arguments xag of the x command deal with the size, format and type of data being inspected - for explanation invoke help x in GDB. On my machine the command returns:
0x555555557fe8: 0x7ffff7fc4020 <global_variable>
On your machine it may only display the address and the data, without the "<global_variable>" helper, which probably comes from an extension i have installed called pwndbg. It is ok, because the value at that address is all we need. We now know that the global_variable is located in memory under the address 0x7ffff7fc4020. Now we may issue info proc mappings in GDB to find out what address range does this address belong to. My output is pretty long, but among all the ranges listed there is one of interest to us:
0x7ffff7fc4000 0x7ffff7fc5000 0x1000 0x3000 /home/user/test_got/dynamic.so
The address is inside of that memory area, and GDB tells us that it comes from the dynamic.so library.
In case any of the outputs of said commands are different for you (change in a value is ok - i mean a fundamental difference like addresses not belonging to certain address ranges etc.) please provide more information about what exactly did you do to come to the conclusion that global_variable is stored in the .data section - what commands did you invoke and what outputs they produced.

Huge Binary size while ld Linking

I have a linker script that links code for imx6q(cortex-A9):
OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(Reset_Handler)
/* SEARCH_DIR(.) */
GROUP(libgcc.a libc.a)
/* INPUT (crtbegin.o crti.o crtend.o crtn.o) */
MEMORY {
/* IROM (rwx) : ORIGIN = 0x00000000, LENGTH = 96K */
IRAM (rwx) : ORIGIN = 0x00900000, LENGTH = 256K
IRAM_MMU (rwx): ORIGIN = 0x00938000, LENGTH = 24K
IRAM_FREE(rwx): ORIGIN = 0x00907000, LENGTH = 196K
DDR (rwx) : ORIGIN = 0x10000000, LENGTH = 1024M
}
/* PROVIDE(__cs3_heap_start = _end); */
SECTIONS {
.vector (ORIGIN(IRAM) + LENGTH(IRAM) - 144 ):ALIGN (32) {
__ram_vectors_start = . ;
. += 72 ;
__ram_vectors_end = . ;
. = ALIGN (4);
} >IRAM
. = ORIGIN(DDR);
.text(.) :ALIGN(8) {
*(.entry)
*(.text)
/* __init_array_start = .; */
/* __init_array_end = .; */
. = ALIGN (4);
__text_end__ = .;
} >DDR
.data :ALIGN(8) {
*(.data .data.*)
__data_end__ = .;
}
.bss(__data_end__) : {
. = ALIGN (4);
__bss_start__ = .;
*(.shbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
__bss_end__ = .;
}
/* . += 10K; */
/* . += 5K; */
top_of_stacks = .;
. = ALIGN (4);
. += 8;
free_memory_start = .;
.mmu_page_table : {
__mmu_page_table_base__ = .;
. = ALIGN (16K);
. += 16K;
} >IRAM_MMU
_end = .;
__end = _end;
PROVIDE(end = .);
}
When i built, the binary size is just 6 KB. But i can not add any initialized variable. When i add an initialized variable, the binary size jumps to ~246 MB. Why is that? I tried to link the data segment at location following text section by specifying exact location and also providing >DDR for the data segment. Even though this seem to reduce the binary size back to 6 KB, the binary fails to boot. How can i keep my code in the DDR and the data, bss, stack and heap in the internal ram itself, with light binary size?
I read in another thread that " using MEMORY tag in linker script should solve the problem of memory waste", How can this be done?
linker script wastes my memory
Plese do ask if anything else needed. I don't have any experience with linker script. Please help
The readelf --sections output of the binary with no initialized data given is as follows,
There are 19 section headers, starting at offset 0xd804:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .vector NOBITS 0093ff80 007f80 000048 00 WA 0 0 32
[ 2] .text PROGBITS 10000000 008000 0016fc 00 AX 0 0 8
[ 3] .text.vectors PROGBITS 100016fc 0096fc 000048 00 AX 0 0 4
[ 4] .text.proc PROGBITS 10001744 009744 000034 00 AX 0 0 4
[ 5] .bss NOBITS 0093ffc8 007fc8 000294 00 WA 0 0 4
[ 6] .mmu_page_table NOBITS 00938000 008000 004000 00 WA 0 0 1
[ 7] .comment PROGBITS 00000000 009778 00001f 01 MS 0 0 1
[ 8] .ARM.attributes ARM_ATTRIBUTES 00000000 009797 00003d 00 0 0 1
[ 9] .debug_aranges PROGBITS 00000000 0097d8 000108 00 0 0 8
[10] .debug_info PROGBITS 00000000 0098e0 0018a7 00 0 0 1
[11] .debug_abbrev PROGBITS 00000000 00b187 00056f 00 0 0 1
[12] .debug_line PROGBITS 00000000 00b6f6 00080e 00 0 0 1
[13] .debug_frame PROGBITS 00000000 00bf04 000430 00 0 0 4
[14] .debug_str PROGBITS 00000000 00c334 0013dd 01 MS 0 0 1
[15] .debug_ranges PROGBITS 00000000 00d718 000020 00 0 0 8
[16] .shstrtab STRTAB 00000000 00d738 0000cb 00 0 0 1
[17] .symtab SYMTAB 00000000 00dafc 000740 10 18 60 4
[18] .strtab STRTAB 00000000 00e23c 000511 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
and The readelf --sections output of the binary with initialized data given is ,
There are 20 section headers, starting at offset 0xd82c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .vector NOBITS 0093ff80 007f80 000048 00 WA 0 0 32
[ 2] .text PROGBITS 10000000 008000 0016fc 00 AX 0 0 8
[ 3] .text.vectors PROGBITS 100016fc 0096fc 000048 00 AX 0 0 4
[ 4] .text.proc PROGBITS 10001744 009744 000034 00 AX 0 0 4
[ 5] .data PROGBITS 0093ffc8 007fc8 000004 00 WA 0 0 8
[ 6] .bss NOBITS 0093ffcc 007fcc 000294 00 WA 0 0 4
[ 7] .mmu_page_table NOBITS 00938000 008000 004000 00 WA 0 0 1
[ 8] .comment PROGBITS 00000000 009778 00001f 01 MS 0 0 1
[ 9] .ARM.attributes ARM_ATTRIBUTES 00000000 009797 00003d 00 0 0 1
[10] .debug_aranges PROGBITS 00000000 0097d8 000108 00 0 0 8
[11] .debug_info PROGBITS 00000000 0098e0 0018b6 00 0 0 1
[12] .debug_abbrev PROGBITS 00000000 00b196 000580 00 0 0 1
[13] .debug_line PROGBITS 00000000 00b716 00080e 00 0 0 1
[14] .debug_frame PROGBITS 00000000 00bf24 000430 00 0 0 4
[15] .debug_str PROGBITS 00000000 00c354 0013dd 01 MS 0 0 1
[16] .debug_ranges PROGBITS 00000000 00d738 000020 00 0 0 8
[17] .shstrtab STRTAB 00000000 00d758 0000d1 00 0 0 1
[18] .symtab SYMTAB 00000000 00db4c 000770 10 19 62 4
[19] .strtab STRTAB 00000000 00e2bc 000513 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
Hope this is enough...!!!
Note: I am using arm-none-eabi-gcc for linking.

If you are not experienced with linker scripts then either use one that just works or make or borrow a simpler one. Here is a simple one, and this should demonstrate what is most likely going on.
MEMORY
{
bob : ORIGIN = 0x00001000, LENGTH = 0x100
ted : ORIGIN = 0x00002000, LENGTH = 0x100
alice : ORIGIN = 0x00003000, LENGTH = 0x100
}
SECTIONS
{
.text : { *(.text*) } > bob
.data : { *(.text*) } > ted
.bss : { *(.text*) } > alice
}
First program
.text
.globl _start
_start:
mov r0,r1
mov r1,r2
b .
not meant to be a real program just creating some bytes in a segment is all.
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00001000 001000 00000c 00 AX 0 0 4
12 bytes in .text which is at address 0x1000 in memory which is exactly what we told it to do.
If I use -objcopy a.elf -O binary a.bin I get a 12-byte file as expected, the "binary" file format is a memory image, starting with the first address that has some content in the address space and ending with the last byte of content in the address space. so instead of 0x1000+12 bytes, the binary is 12 bytes ad the user has to know it needs to be loaded at 0x1000.
So change this up a little:
.text
.globl _start
_start:
mov r0,r1
mov r1,r2
b .
.data
some_data: .word 0x12345678
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00001000 001000 00000c 00 AX 0 0 4
[ 2] .data PROGBITS 00002000 002000 000004 00 WA 0 0 1
Now we have 12 bytes at 0x1000 and 4 bytes at 0x2000, so -O binary has to give us one memory image from the first defined byte to the last so that would be 0x1000+4.
Sure enough 4100 bytes that is exactly what it did.
.text
.globl _start
_start:
mov r0,r1
mov r1,r2
b .
.data
some_data: .word 0x12345678
.bss
some_more_data: .word 0
which gives
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00001000 001000 00000c 00 AX 0 0 4
[ 2] .data PROGBITS 00002000 002000 000004 00 WA 0 0 1
[ 3] .bss NOBITS 00003000 003000 000004 00 WA 0 0 1
Now I only got a 4100-byte file, and that is actually not surprising, it is assumed that the bootstrap is going to zero .bss So that didn't grow the "binary" file.
There is an intimate relationship. A system-level design. Between the linker script and the bootstrap. For what it appears you are trying to do (just ram no rom) you can probably get away with a much simpler linker script, on par with the one I have but if you care about .bss being zeroed then there are some tricks you can use:
MEMORY
{
ram : ORIGIN = 0x00001000, LENGTH = 0x3000
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
.data : { *(.text*) } > ram
}
make sure there is at least one .data item and your "binary" will have the complete image with bss already zeroed, the bootstrap simply needs to set the stack pointer(s) and jump to main (if this is for C).
Anyway, hopefully, you can see that the jump from 12 bytes to 4100 bytes was because of the addition of a .data element and the "binary" format having to pad the "binary" file so that the file was a memory image from the lowest address with data to the highest address with data (from 0x1000 to 0x2000+sizeof(.data)-1 in this case). Change the linker script, the 0x1000, and the 0x2000 and this all changes. Swap them put .text at 0x2000 and .data at 0x1000, now the "binary" file has to be 0x2000-0x1000+sizeof(.text) rather than 0x2000-0x1000+sizeof(.data). or 0x100C bytes instead of 0x1004. go back to the first linker script and make .data at 0x20000000 now the "binary" will be 0x20000000-0x1000+sizeof(.data) because that is how much information including padding is required to make a memory image in a single file.
It is most likely that is what is going on. As demonstrated here the file size went from 12 bytes to 4100 by simply adding one word of data.
EDIT.
Well if you noload the data then your initialized variable won't be initialized, it is that simple
unsigned int x = 5;
will not be a 5 if you discard (NOLOAD) .data.
As has been stated and stated again, you can have the data put in the .text sector and then use more linker script foo, to have the bootstrap find that data.
MEMORY
{
bob : ORIGIN = 0x00001000, LENGTH = 0x100
ted : ORIGIN = 0x00002000, LENGTH = 0x100
alice : ORIGIN = 0x00003000, LENGTH = 0x100
}
SECTIONS
{
.text : { *(.text*) } > bob
.data : { *(.text*) } > ted AT > bob
.bss : { *(.text*) } > alice AT > bob
}
This creates a 16 byte "binary" file. the 12 bytes of instruction and the 4 bytes of .data. But you don't know where the data is unless you do some hardcoding which is a bad idea. This is where things like bss_start and bss_end are found in your linker script.
something like this
MEMORY
{
bob : ORIGIN = 0x00001000, LENGTH = 0x100
ted : ORIGIN = 0x00002000, LENGTH = 0x100
alice : ORIGIN = 0x00003000, LENGTH = 0x100
}
SECTIONS
{
.text : { *(.text*) } > bob
.data : {
__data_start__ = .;
*(.data*)
} > ted AT > bob
__data_end__ = .;
__data_size__ = __data_end__ - __data_start__;
.bss : { *(.text*) } > alice AT > bob
}
.text
.globl _start
_start:
mov r0,r1
mov r1,r2
b .
hello:
.word __data_start__
.word __data_end__
.word __data_size__
.data
some_data: .word 0x12345678
which gives us.
Disassembly of section .text:
00001000 <_start>:
1000: e1a00001 mov r0, r1
1004: e1a01002 mov r1, r2
1008: eafffffe b 1008 <_start+0x8>
0000100c <hello>:
100c: 00002000 andeq r2, r0, r0
1010: 00002004 andeq r2, r0, r4
1014: 00000004 andeq r0, r0, r4
Disassembly of section .data:
00002000 <__data_start__>:
2000: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
and the toolchain/linker creates and fills those defined names in the linker script, and then fills them into your code when it resolves those externals. Then your bootstrap needs to use those variables (and more that I didn't include here where to find the .data in the .text you know from the above that there are 4 bytes and they need to land at 0x2000 but where in the 0x1000 .text area are those 4 bytes found? More linker script foo. Also, note the gnu linker scripts are very sensitive as to where you define those variables. before or after the squiggly brackets can have different results.
This is why I mentioned it appeared you were using ram. If this is a rom based target and you want .data and zeroed .bss then you pretty much have to put the .data and the size and location of .bss in the flash/rom area and the bootstrap has to copy and zero. Alternatively, you can choose not to use .data nor .bss
unsigned int x=5;
unsigned int y;
instead
unsigned int x;
unsigned int y;
...
x=5;
y=0;
Yes, it is not as efficient binary size-wise, but linker scripts are very much toolchain dependent, and with gnu for example over time the linker script language/rules change, what worked on a prior major version of gnu ld doesn't necessarily work on the current or next, I have had to re-architect my minimal linker script over the years as a result.
As demonstrated here, you can use your command line tools to experiment with settings and locations and see what the toolchain has produced.
Bottom line it sounds like you added some information in .data, but then state you want to NOLOAD it, basically meaning that .data isnt there/used your variables are not initialized correctly, so why bother changing the code to cause all this to happen only to have it not work anyway? Either has .data and use it right, have the right bootstrap and linker script pair, or if it is ram only just pack it all up into the same ram space, or don't use the "binary" format you are using, use elf or ihex or srec or other.
Another trick depending on your system is to build the binary for ram, all packed up, then have another program that wraps around that binary runs from rom and copies to ram and jumps. Take the 16 byte program above, write another that includes those 16 bytes from that build, and copies them to 0x1000 and then branches to 0x1000. Depending on the system and the flash/rom technology and interface you may wish to do this anyway, the system from my day job uses a spi flash to boot, which are known to have read-disturb problems and are...spi... So the fastest, cleanest, most reliable solution, is to do the copy jump before doing anything else. Making the linker script much easier as a free side effect.

embed assembly produced code into C program using FASM

I am trying to link assembly-compiled with c-compiled code, and I get undefined reference error during linking phase. This is how i do it:
[niko#dev1 test]$ cat ssefuncs.asm
format ELF64
EQUAL_ANY = 0000b
RANGES = 0100b
EQUAL_EACH = 1000b
EQUAL_ORDERED = 1100b
NEGATIVE_POLARITY = 010000b
BYTE_MASK = 1000000b
asm_sse:
movntdqa xmm0,[eax]
pcmpestri xmm0,[ecx],0x0
ret
[niko#dev1 test]$ fasm ssefuncs.asm ssefuncs.o
flat assembler version 1.71.50 (16384 kilobytes memory)
1 passes, 405 bytes.
[niko#dev1 test]$ ls -l ssefuncs.o
-rw-r--r-- 1 niko niko 405 Jan 31 14:52 ssefuncs.o
[niko#dev1 test]$ objdump -M intel -d ssefuncs.o
ssefuncs.o: file format elf64-x86-64
Disassembly of section .flat:
0000000000000000 <.flat>:
0: 67 66 0f 38 2a 00 movntdqa xmm0,XMMWORD PTR [eax]
6: 67 66 0f 3a 61 01 00 pcmpestri xmm0,XMMWORD PTR [ecx],0x0
d: c3 ret
[niko#dev1 test]$ cat stest.c
void asm_sse();
int main() {
asm_sse();
}
[niko#dev1 test]$ gcc -c stest.c
[niko#dev1 test]$ gcc -o stest ssefuncs.o stest.o
stest.o: In function `main':
stest.c:(.text+0xa): undefined reference to `asm_sse'
collect2: error: ld returned 1 exit status
[niko#dev1 test]$
Looking at the ELF file, it is very thin and I don't see any symbols. :
[niko#dev1 test]$ readelf -a ssefuncs.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 149 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 4
Section header string table index: 3
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .flat PROGBITS 0000000000000000 00000040
000000000000000e 0000000000000000 WAX 0 0 8
[ 2] .symtab SYMTAB 0000000000000000 0000004e
0000000000000030 0000000000000018 3 2 8
[ 3] .strtab STRTAB 0000000000000000 0000007e
0000000000000017 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
There are no relocations in this file.
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 2 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .flat
No version information found in this file.
[niko#dev1 test]$
What is the correct way to embed FASM produced assembly code into a C program?

Subroutine names in assembly code are usually simply labels for certain positions within the instruction stream. They are not automatically made visible for linking with external object code. To make it possible, a symbol should be declared public. Also, by convention the code in ELF files resides in the .text section. Your assembly file should look like this:
format ELF64
EQUAL_ANY = 0000b
RANGES = 0100b
EQUAL_EACH = 1000b
EQUAL_ORDERED = 1100b
NEGATIVE_POLARITY = 010000b
BYTE_MASK = 1000000b
section '.text' code readable executable
asm_sse:
movntdqa xmm0,[eax]
pcmpestri xmm0,[ecx],0x0
ret
public asm_sse

It much depends on the compiler used. E.g. GCC (and by cloning, clang) has a very extensive facility for writing assembly language snippets in-line, handling the routine details of interfacing with the surrounding code (saving clobbered registers as needed, placing inputs where they can be used and picking up results, and matching inputs/outputs with what is given). This is usually the easiest way to go.
If the above isn't an option, you should start by writing a short C program, and compile it to assembly. Something like cc -g -S somefile.c should give you a somefile.s with assembly language. The -g (or other debugging enablement) should include comments in the code, allowing easier backreference to C. This will allow you to reverse engineer the compiler's result, and serve as a starting point for a standalone assembly file by messing with the inards of the compiled functions.
As the comment by #LaurentH says, often compilers mangle names of source symbols in generated assembly language to prevent clashing with outside symbols, by e.g. prepending _ or even some characters legal in the specific assembly but not in C, like . or $.

How to find the offset of the section header string table of an elf file?

I have to write a C program that prints an ELF file. I'm having trouble figuring out where the section header string table is.
Let's say I have a file that gave me the following output with:
readelf -h
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 17636 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 40 (bytes)
Number of section headers: 23
Section header string table index: 20
and with:
readelf -S:
There are 23 section headers, starting at offset 0x44e4:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000028 00 AX 0 0 4
[ 2] .rel.text REL 00000000 0049d0 000018 08 21 1 4
[ 3] .data PROGBITS 00000000 00005c 000000 00 WA 0 0 4
[ 4] .bss NOBITS 00000000 00005c 000000 00 WA 0 0 4
[ 5] .rodata PROGBITS 00000000 00005c 00000a 00 A 0 0 1
[ 6] .debug_info PROGBITS 00000000 000066 00008f 00 0 0 1
[ 7] .rel.debug_info REL 00000000 0049e8 0000b0 08 21 6 4
[ 8] .debug_abbrev PROGBITS 00000000 0000f5 000041 00 0 0 1
[ 9] .debug_loc PROGBITS 00000000 000136 000038 00 0 0 1
[10] .debug_aranges PROGBITS 00000000 00016e 000020 00 0 0 1
[11] .rel.debug_arange REL 00000000 004a98 000010 08 21 10 4
[12] .debug_line PROGBITS 00000000 00018e 0001b3 00 0 0 1
[13] .rel.debug_line REL 00000000 004aa8 000008 08 21 12 4
[14] .debug_macinfo PROGBITS 00000000 000341 003fb9 00 0 0 1
[15] .debug_str PROGBITS 00000000 0042fa 0000bf 01 MS 0 0 1
[16] .comment PROGBITS 00000000 0043b9 00002b 01 MS 0 0 1
[17] .note.GNU-stack PROGBITS 00000000 0043e4 000000 00 0 0 1
[18] .eh_frame PROGBITS 00000000 0043e4 000038 00 A 0 0 4
[19] .rel.eh_frame REL 00000000 004ab0 000008 08 21 18 4
[20] .shstrtab STRTAB 00000000 00441c 0000c5 00 0 0 1
[21] .symtab SYMTAB 00000000 00487c 000130 10 22 16 4
[22] .strtab STRTAB 00000000 0049ac 000021 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
How would I be able to calculate the location of the section header string table?

This is what I do:
#pragma pack(push,1)
typedef struct
{
uint8 e_ident[16];
uint16 e_type;
uint16 e_machine;
uint32 e_version;
uint32 e_entry;
uint32 e_phoff;
uint32 e_shoff;
uint32 e_flags;
uint16 e_ehsize;
uint16 e_phentsize;
uint16 e_phnum;
uint16 e_shentsize;
uint16 e_shnum;
uint16 e_shstrndx;
} Elf32Hdr;
typedef struct
{
uint32 sh_name;
uint32 sh_type;
uint32 sh_flags;
uint32 sh_addr;
uint32 sh_offset;
uint32 sh_size;
uint32 sh_link;
uint32 sh_info;
uint32 sh_addralign;
uint32 sh_entsize;
} Elf32SectHdr;
#pragma pack(pop)
{
FILE* ElfFile = NULL;
char* SectNames = NULL;
Elf32Hdr elfHdr;
Elf32SectHdr sectHdr;
uint idx;
// ...
// read ELF header
fread(&elfHdr, 1, sizeof elfHdr, ElfFile);
// read section name string table
// first, read its header
fseek(ElfFile, elfHdr.e_shoff + elfHdr.e_shstrndx * sizeof sectHdr, SEEK_SET);
fread(&sectHdr, 1, sizeof sectHdr, ElfFile);
// next, read the section, string data
SectNames = malloc(sectHdr.sh_size);
fseek(ElfFile, sectHdr.sh_offset, SEEK_SET);
fread(SectNames, 1, sectHdr.sh_size, ElfFile);
// read all section headers
for (idx = 0; idx < elfHdr.e_shnum; idx++)
{
const char* name = "";
fseek(ElfFile, elfHdr.e_shoff + idx * sizeof sectHdr, SEEK_SET);
fread(&sectHdr, 1, sizeof sectHdr, ElfFile);
// print section name
if (sectHdr.sh_name);
name = SectNames + sectHdr.sh_name;
printf("%2u %s\n", idx, name);
}
// ...
}

Here is how you would get to section name string table:
The e_shstrndx field of the ELF Executable Header (EHDR) specifies the index of the section header table entry describing the string table containing section names.
The e_shentsize field of the EHDR specifies the size in bytes of each section header table entry.
The e_shoff field of the EHDR specifies the file offset to the start of the section header table.
Thus the header entry for the string table will be at file offset:
header_table_entry_offset = (e_shentsize * e_shstrndx) + e_shoff
Depending on the ELF class of the object, this header table entry will be either an Elf32_Shdr or an Elf64_Shdr. Note that the file representation of the header table entry may differ from its in-memory representation on the host that your program is running on.
The sh_offset field in the header entry will specify the starting file offset of the string table, while the sh_size field will specify its size.
Further reading: The "libelf by Example" tutorial covers the ELF Executable Header and ELF Section Header Table in greater depth.

It seems that the one you want is STRTAB, which has an offset of 0x441c (line [20]). The headers start 17636 bytes into the file (according to
Start of section headers: 17636 (bytes into file)
So I am going to guess that your string table starts at 17636 + 17436 = 35072 bytes from the start of the file.
I could be completely wrong, but that's where I would start. Do you know how to use od or some such utility to do a dump of file bytes?

The section header table starts at starting point of your file + e_shoff. The elf.h file already has Elfxx_Shdr struct and a Elfxx_Ehdr struct, where xx above is either 32 or 64 for 32 bit or 64 bit respectively.
You can use e_shentsize from the Elfxx_Ehdr to get the section header string table index and then use sh_name and sh_offset from the Elfxx_Shdr. sh_name will give you an index of strtab in which if you add the sh_offset. That will give you the full name of the section. Same for the type, flag and others by using sh_type, sh_flag etc.
I would suggest you to go over executable and linkable format wikipedia article
and the efl.h file man page.