How do I add address of struct to binary linker output - c

This might seem a strange question, but I'm generating a binary file and need to put some data in the header.
I'm using gcc and a fairly standard Cortex M4 bare-bones linker script.
Instead of putting the ISR vector first in the binary, I'm putting my own header. The binary will be copied to a pre-determined memory location (0x20008000, to which it has been linked) and run from there.
my startup.s contains this:
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
g_pfnVectors:
.word 0xDADAC0DE /* magic number */
.word .isr_vector /* link base address */
.word Reset_Handler /* code entry point */
.word _end /* stack start */
.word _estack /* stack end */
.word ProgramVector /* pointer to shared memory block */
This all works fine, with the exception of ProgramVector. I define it in my main.c as follows:
typedef struct {
uint8_t checksum;
int16_t* audio_input;
int16_t* audio_output;
...
} SharedMemory;
SharedMemory ProgramVector;
I would expect the binary to include the address to ProgramVector. If I compare the output to what I see in the linker map file, everything matches (Reset_Handler, _end, _estack) but not ProgramVector.
My binary file app.bin :
00000000 de c0 da da 00 80 00 20 0d 9a 00 20 d8 ce 00 20 |....... ... ... |
00000010 00 c0 01 20 60 ce 00 20 60 ce 00 20 4f 57 4c 20 |... `.. `.. OWL |
00000020 50 72 6f 67 72 61 6d 00 10 b5 05 4c 23 78 33 b9 |Program....L#x3.|
From which we can determine that ProgramVector is 0x2000ce60, while my map files says:
0x2001c000 _estack = 0x2001c000
...
.bss 0x2000ae28 0x38 Build/main.o
0x2000ae28 ProgramVector
...
0x2000ced8 PROVIDE (_end, .)
Now you might say that ProgramVector is not a pointer, which is true enough. But I would expect the linker to output the address where it is placed. If I make it a SharedMemory* pointer, the output has the correct address of the pointer. The problem is I need the address of the struct, and I need it before the program has initialised.
I have tried this with a variety of compiler and linker flags, to no avail. Current compilation looks like this:
arm-none-eabi-gcc -Wl,--gc-sections -TSource/flash.ld -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -o Build/app.elf Build/sta./Build/libnosys_gnu.o -lm
arm-none-eabi-objcopy -O binary Build/solo.elf Build/solo.bin
I'm probably missing something obvious but going mad trying to figure out what. Any help, pointers or advice appreciated!

Related

Assembly incbin file and use in C file + GCC 5.4.0

I have an assembly file. I will use this file to include a binary file like below:
.section .bindata
.global imrdls_start
.type imrdls_start, #object
.global imr_SW_DL_start
.type imr_SW_DL_start, #object
.section .bindata
.balign 64
imrdls_start:
imr_SW_DL_start:
.incbin "file.bin"
.balign 1
imr_SW_DL_end:
.byte 0
Then in C file, I will cal to that variable and use the content of that binary file.
int main(void) {
extern uint8_t imrdls_start;
uint8_t *ptrToExpectedDL = &imrdls_start;
for(int i = 0; i < 135; i++)
{
printf("0x%02x ", ptrToExpectedDL[i]);
if((((i + 1) % 15) == 0)) printf("\n");
}
return EXIT_SUCCESS;
}
The thing is, after compiling and execute, the content of "file.bin" print out is not correct.
The expected result are: 00 1d 81 ff 00 fe 00 ff 00 1e 82 00 00 20 82 ...
The trash output print are: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 ...
Below is my compiler and linking option:
qcc -Vgcc_ntoaarch64le -c -Wp,-MMD,build/aarch64le-debug/src/imrdls.d,-MT,build/aarch64le-debug/src/imrdls.o -o build/aarch64le-debug/src/imrdls.o -Wall -fmessage-length=0 -g -O0 -fno-builtin src/imrdls.s
qcc -Vgcc_ntoaarch64le -c -Wp,-MMD,build/aarch64le-debug/src/Test.d,-MT,build/aarch64le-debug/src/Test.o -o build/aarch64le-debug/src/Test.o -Wall -fmessage-length=0 -g -O0 -fno-builtin src/Test.c
qcc -Vgcc_ntoaarch64le -o build/aarch64le-debug/Test build/aarch64le-debug/src/Test.o build/aarch64le-debug/src/imrdls.o
Any comments will be really helpful. Thank you.
if you look at the trash output "7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00" you can see that it is same as the elf header. link
When one creates a new section using .section directive, one must provide the attributes and type for that section. replacing the first line in your assembly file with this should work:
.section .bindata , "a", #progbits
a marks the section as allocatable. ("aw" would also make it writeable, but you don't need that for constants. You'd use "aw" for an equivalent of .data, not .rodata.)
If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to have none of the flags: it will not be allocated in memory, nor writable, nor executable. The section will contain data. Reference
Your data is in a section with a non-standard name, .bindata. I don't know where the linker puts it, but apparently it's not mapped into an executable segment that gets loaded (or memory mapped) from the file when you run the program.
Unless you really need to control the layout of the included data relative to compiler-generated read-only data, just put your data in .section .rodata.
(I'm surprised that the linker didn't complain, and that you didn't get a segfault at runtime. I would have hoped for at least a segfault instead of silently getting bogus data.)

Why static string in .rodata section has a four dots prefix in GCC?

For the following code:
#include <stdio.h>
int main() {
printf("Hello World");
printf("Hello World1");
return 0;
}
the generated assembly for calling printf is as follows (64 bits):
400474: be 24 06 40 00 mov esi,0x400624
400479: bf 01 00 00 00 mov edi,0x1
40047e: 31 c0 xor eax,eax
400480: e8 db ff ff ff call 400460 <__printf_chk#plt>
400485: be 30 06 40 00 mov esi,0x400630
40048a: bf 01 00 00 00 mov edi,0x1
40048f: 31 c0 xor eax,eax
400491: e8 ca ff ff ff call 400460 <__printf_chk#plt>
And the .rodata section is as follows:
Contents of section .rodata:
400620 01000200 48656c6c 6f20576f 726c6400 ....Hello World.
400630 48656c6c 6f20576f 726c6431 00 Hello World1.
Based on the assembly code, the first call for printf has the argument with address 400624 which has a 4 byte offset from the start of .rodata. I know it skips the first 4 bytes for these 4 dots prefix here. But my question is why GCC/linker produce this prefix for string in .rodata ? I am using 4.8.4 GCC on Ubuntu 14.04. The compilation cmd is just: gcc -Ofast my-source.c -o my-program.
For starters, those are not four dots, the dot just means unprintable character. You can see in the hex dump that those bytes are 01 00 02 00.
The final program contains other object files added by the linker, which are part of the C runtime library. This data is used by code there.
You can see the address is 0x400620. You can then try to find a matching symbol, for example you can load it into gdb and use the info symbol command:
(gdb) info symbol 0x4005f8
_IO_stdin_used in section .rodata of /tmp/a.out
(Note I had a different address.)
Taking it further, you can actually find the source for this in glibc:
/* This records which stdio is linked against in the application. */
const int _IO_stdin_used = _G_IO_IO_FILE_VERSION;
and
#define _G_IO_IO_FILE_VERSION 0x20001
Which corresponds to the value you see if you account for little-endian storage.
It does not prefix the data. The .rodata can contain anything. The first four bytes are [seemingly] a string, but it just happens to link there (i.e. it's for something else). It is unrelated to your "Hello World"

Implement a similar module_init as Linux kernel, but meet some trouble in ld script

I like the linux kernel module_init function very much, I would like to implement the same function for my user space applications.
I try to modify the linker script to do this:
1, copy a x86-64 standard ld script
2, add my customized section
.module.init :
{
PROVIDE_HIDDEN (__module_init_start = .);
*(.module_init*)
PROVIDE_HIDDEN (__module_init_end = .);
}
3, put the init function pointer into moudle_init section
#define app_module_init(x) __initcall(x);
#define __initcall(fn) \
static initcall_t __initcall_##fn \
__attribute__ ((__section__(".module_init"))) = fn
app_module_init(unit_test_1_init);
app_module_init(unit_test_2_init);
app_module_init(unit_test_3_init);
app_module_init(unit_test_4_init);
4, compile the app with a customized linker script(based on the standard one)
gcc -o "./module_init" -T module.lds ./module_init.o
5, Then I objdump the moudle_init, I found the section is generated:
Disassembly of section .module_init:
0000000000a01080 <__initcall_unit_test_1_init>:
a01080: ad lods %ds:(%rsi),%eax
a01081: 05 40 00 00 00 add $0x40,%eax
...
0000000000a01088 <__initcall_unit_test_2_init>:
a01088: c2 05 40 retq $0x4005
a0108b: 00 00 add %al,(%rax)
a0108d: 00 00 add %al,(%rax)
...
0000000000a01090 <__initcall_unit_test_3_init>:
a01090: d7 xlat %ds:(%rbx)
a01091: 05 40 00 00 00 add $0x40,%eax
...
0000000000a01098 <__initcall_unit_test_4_init>:
a01098: ec in (%dx),%al
a01099: 05 40 00 00 00 add $0x40,%eax
But the __module_init_start and __module_init_end variable is not the value I expected. In my case __module_init_start is 0x4005ad and __module_init_end is 0x400000003.
This is very weird, because 0x4005ad is the address of __initcall_unit_test_1_init.
Anyone can give me an idea on how to make this user space module_init work?
The linker script can only set the addresses of variables. Use &__module_init_start to get a pointer to the start of the section, and &__module_init_end to get a pointer to the end.

execute binary machine code from C

following this instructions I have managed to produce only 528 bytes in size a.out (when gcc main.c gave me 8539 bytes big file initially).
main.c was:
int main(int argc, char** argv) {
return 42;
}
but I have built a.out from this assembly file instead:
main.s:
; tiny.asm
BITS 64
GLOBAL _start
SECTION .text
_start:
mov eax, 1
mov ebx, 42
int 0x80
with:
me#comp# nasm -f elf64 tiny.s
me#comp# gcc -Wall -s -nostartfiles -nostdlib tiny.o
me#comp# ./a.out ; echo $?
42
me#comp# wc -c a.out
528 a.out
because I need machine code I do:
objdump -d a.out
a.out: file format elf64-x86-64
Disassembly of section .text:
00000000004000e0 <.text>:
4000e0: b8 01 00 00 00 mov $0x1,%eax
4000e5: bb 2a 00 00 00 mov $0x2a,%ebx
4000ea: cd 80 int $0x80
># objdump -hrt a.out
a.out: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.build-id 00000024 00000000004000b0 00000000004000b0 000000b0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 0000000c 00000000004000e0 00000000004000e0 000000e0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
SYMBOL TABLE:
no symbols
file is in little endian convention:
me#comp# readelf -a a.out
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4000e0
Start of program headers: 64 (bytes into file)
Start of section headers: 272 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 2
Size of section headers: 64 (bytes)
Number of section headers: 4
Section header string table index: 3
now I want to execute this like this:
#include <unistd.h>
// which version is (more) correct?
// this might be related to endiannes (???)
char code[] = "\x01\xb8\x00\x00\xbb\x00\x00\x2a\x00\x00\x80\xcd\x00";
char code_v1[] = "\xb8\x01\x00\x00\x00\xbb\x2a\x00\x00\x00\xcd\x80\x00";
int main(int argc, char **argv)
{
/*creating a function pointer*/
int (*func)();
func = (int (*)()) code;
(int)(*func)();
return 0;
}
however I get segmentation fault. My question is: is this section of text
4000e0: b8 01 00 00 00 mov $0x1,%eax
4000e5: bb 2a 00 00 00 mov $0x2a,%ebx
4000ea: cd 80 int $0x80
(this machine code) all I really need? What I do wrong (endiannes??), maybe I just need to call this in different way since SIGSEGV?
The code must be in a page with execute permission. By default, stack and read-write static data (like non-const globals) are in pages mapped without exec permission, for security reasons.
The simplest way is to compile with gcc -z execstack, which links your program such that stack and global variables (static storage) get mapped in executable pages, and so do allocations with malloc.
Another way to do it without making everything executable is to copy this binary machine code into an executable buffer.
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
char code[] = {0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x48,
0x89,0x75,0xf0,0xb8,0x2a,0x00,0x00,0x00,0xc9,0xc3,0x00};
/*
00000000004004b4 <main> 55 push %rbp
00000000004004b5 <main+0x1> 48 89 e5 mov %rsp,%rbp
00000000004004b8 <main+0x4> 89 7d fc mov %edi,-0x4(%rbp)
00000000004004bb <main+0x7> 48 89 75 f0 mov %rsi,-0x10(%rbp)
'return 42;'
00000000004004bf <main+0xb> b8 2a 00 00 00 mov $0x2a,%eax
'}'
00000000004004c4 <main+0x10> c9 leaveq
00000000004004c5 <main+0x11> c3 retq
*/
int main(int argc, char **argv) {
void *buf;
/* copy code to executable buffer */
buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (buf, code, sizeof(code));
__builtin___clear_cache(buf, buf+sizeof(code)-1); // on x86 this just stops memcpy from optimizing away as a dead store
/* run code */
int i = ((int (*) (void))buf)();
printf("get this done. returned: %d", i);
return 0;
}
output:
get this done. returned: 42
RUN SUCCESSFUL (total time: 57ms)
Without __builtin___clear_cache, this could break with optimization enabled because gcc would think the memcpy was a dead store and optimize it away. When compiling for x86, __builtin___clear_cache does not actually clear any cache; there are zero extra instructions; it just marks the memory as "used" so stores to it aren't considered "dead". (See the gcc manual.)
Another option would be to mprotect the page containing the char code[] array, giving it PROT_READ|PROT_WRITE|PROT_EXEC. This works whether it's a local array (on the stack) or global in the .data.
Or if it's const char code[] in the .rodata section, you might just give it PROT_READ|PROT_EXEC.
(In versions of binutils ld from before about 2019, the .rodata got linked as part of the same segment as .text, and was already mapped executable. But recent ld gives it a separate segment so it can be mapped without exec permission so const char code[] doesn't give you an executable array anymore, but it used to so you may this old advice in other places.)
The point is that DEP protection is enabled!
you can goto Configurations -> Linker -> Advance -> DEP turn off ,
it's ok now .
void main(){
int i = 11;
//The following is the method to generate the machine code directly!
//mov eax, 1; ret;
const char *code = "\xB8\x10\x00\x00\x00\xc3";
__asm call code; //test successful~..vs 2017
__asm mov i ,eax;
printf("i=%d", i);
}

Where do static local variables go

Where are static local variables stored in memory? Local variables can be accessed only inside the function in which they are declared.
Global static variables go into the .data segment.
If both the name of the static global and static local variable are same, how does the compiler distinguish them?
Static variables go into the same segment as global variables. The only thing that's different between the two is that the compiler "hides" all static variables from the linker: only the names of extern (global) variables get exposed. That is how compilers allow static variables with the same name to exist in different translation units. Names of static variables remain known during the compilation phase, but then their data is placed into the .data segment anonymously.
Static variable is almost similar to global variable and hence the uninitialized static variable is in BSS and the initialized static variable is in data segment.
As mentioned by dasblinken, GCC 4.8 puts local statics on the same place as globals.
More precisely:
static int i = 0 goes on .bss
static int i = 1 goes on .data
Let's analyze one Linux x86-64 ELF example to see it ourselves:
#include <stdio.h>
int f() {
static int i = 1;
i++;
return i;
}
int main() {
printf("%d\n", f());
printf("%d\n", f());
return 0;
}
To reach conclusions, we need to understand the relocation information. If you've never touched that, consider reading this post first.
Compile it:
gcc -ggdb -c main.c
Decompile the code with:
objdump -S main.o
f contains:
int f() {
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
static int i = 1;
i++;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
a: 83 c0 01 add $0x1,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <f+0x13>
return i;
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <f+0x19>
}
19: 5d pop %rbp
1a: c3 retq
Which does 3 accesses to i:
4 moves to the eax to prepare for the increment
d moves the incremented value back to memory
13 moves i to the eax for the return value. It is obviously unnecessary since eax already contains it, and -O3 is able to remove that.
So let's focus just on 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
Let's look at the relocation data:
readelf -r main.o
which says how the text section addresses will be modified by the linker when it is making the executable.
It contains:
Relocation section '.rela.text' at offset 0x660 contains 9 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000300000002 R_X86_64_PC32 0000000000000000 .data - 4
We look at .rela.text and not the others because we are interested in relocations of .text.
Offset 6 falls right into the instruction that starts at byte 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
^^
This is offset 6
From our knowledge of x86-64 instruction encoding:
8b 05 is the mov part
00 00 00 00 is the address part, which starts at byte 6
AMD64 System V ABI Update tells us that R_X86_64_PC32 acts on 4 bytes (00 00 00 00) and calculates the address as:
S + A - P
which means:
S: the segment pointed to: .data
A: the Added: -4
P: the address of byte 6 when loaded
-P is needed because GCC used RIP relative addressing, so we must discount the position in .text
-4 is needed because RIP points to the following instruction at byte 0xA but P is byte 0x6, so we need to discount 4.
Conclusion: after linking it will point to the first byte of the .data segment.

Resources