Assembly incbin file and use in C file + GCC 5.4.0

Assembly incbin file and use in C file + GCC 5.4.0 - c

I have an assembly file. I will use this file to include a binary file like below:
.section .bindata
.global imrdls_start
.type imrdls_start, #object
.global imr_SW_DL_start
.type imr_SW_DL_start, #object
.section .bindata
.balign 64
imrdls_start:
imr_SW_DL_start:
.incbin "file.bin"
.balign 1
imr_SW_DL_end:
.byte 0
Then in C file, I will cal to that variable and use the content of that binary file.
int main(void) {
extern uint8_t imrdls_start;
uint8_t *ptrToExpectedDL = &imrdls_start;
for(int i = 0; i < 135; i++)
{
printf("0x%02x ", ptrToExpectedDL[i]);
if((((i + 1) % 15) == 0)) printf("\n");
}
return EXIT_SUCCESS;
}
The thing is, after compiling and execute, the content of "file.bin" print out is not correct.
The expected result are: 00 1d 81 ff 00 fe 00 ff 00 1e 82 00 00 20 82 ...
The trash output print are: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 ...
Below is my compiler and linking option:
qcc -Vgcc_ntoaarch64le -c -Wp,-MMD,build/aarch64le-debug/src/imrdls.d,-MT,build/aarch64le-debug/src/imrdls.o -o build/aarch64le-debug/src/imrdls.o -Wall -fmessage-length=0 -g -O0 -fno-builtin src/imrdls.s
qcc -Vgcc_ntoaarch64le -c -Wp,-MMD,build/aarch64le-debug/src/Test.d,-MT,build/aarch64le-debug/src/Test.o -o build/aarch64le-debug/src/Test.o -Wall -fmessage-length=0 -g -O0 -fno-builtin src/Test.c
qcc -Vgcc_ntoaarch64le -o build/aarch64le-debug/Test build/aarch64le-debug/src/Test.o build/aarch64le-debug/src/imrdls.o
Any comments will be really helpful. Thank you.

if you look at the trash output "7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00" you can see that it is same as the elf header. link
When one creates a new section using .section directive, one must provide the attributes and type for that section. replacing the first line in your assembly file with this should work:
.section .bindata , "a", #progbits
a marks the section as allocatable. ("aw" would also make it writeable, but you don't need that for constants. You'd use "aw" for an equivalent of .data, not .rodata.)
If no flags are specified, the default flags depend upon the section name. If the section name is not recognized, the default will be for the section to have none of the flags: it will not be allocated in memory, nor writable, nor executable. The section will contain data. Reference

Your data is in a section with a non-standard name, .bindata. I don't know where the linker puts it, but apparently it's not mapped into an executable segment that gets loaded (or memory mapped) from the file when you run the program.
Unless you really need to control the layout of the included data relative to compiler-generated read-only data, just put your data in .section .rodata.
(I'm surprised that the linker didn't complain, and that you didn't get a segfault at runtime. I would have hoped for at least a segfault instead of silently getting bogus data.)

Related

How does linker resolve references to data object in shared libraries at link time?

I am learning about linking and found a small question that I could not understand.
Consider the following files:
main.c
#include "other.h"
extern int i;
int main() {
++i;
inci();
return 0;
}
other.c
int i = 0;
void inci() {
++i;
}
Then I compile these two files:
gcc -c main.c
gcc -shared -fpic other.c -o libother.so
gcc -o main main.o ./libother.so
Here is part of the dissasemble of main.o:
f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 15 <main+0x15>
15: 83 c0 01 add $0x1,%eax
18: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 1e <main+0x1e>
1e: b8 00 00 00 00 mov $0x0,%eax
23: e8 00 00 00 00 call 28 <main+0x28>
Here is part of the disassemble of main:
1148: 8b 05 ca 2e 00 00 mov 0x2eca(%rip),%eax # 4018 <i##Base>
114e: 83 c0 01 add $0x1,%eax
1151: 89 05 c1 2e 00 00 mov %eax,0x2ec1(%rip) # 4018 <i##Base>
1157: b8 00 00 00 00 mov $0x0,%eax
115c: e8 cf fe ff ff call 1030 <inci#plt>
They both correspond to C code:
++i;
According to the assembly, it seems that the linker has already decided the run-time address of i, because it is using a PC-relative address to reference it directly, rather than using GOT. However, as far as I know, the shared library is only loaded into memory when the program uses it loads. Thus, the executable main should have no knowledge about the address of i at link time. Then, how does the linker determine that i is located at 0x4020?
Also what does the comment i##Base mean?

According to the assembly, it seems that the linker has already decided the run-time address of i, because it is using a PC-relative address to reference it directly, rather than using GOT.
Correct.
However, as far as I know, the shared library is only loaded into memory when the program uses it loads.
Correct, except the i variable in the shared library is never used, and so its address doesn't matter.
What happens here is described pretty well in Solaris documentation:
Suppose the link-editor is used to create a dynamic executable, and a reference to a data item is found to reside in one of the dependent shared objects. Space is allocated in the dynamic executable's .bss, equivalent in size to the data item found in the shared object. This space is also assigned the same symbolic name as defined in the shared object. Along with this data allocation, the link-editor generates a special copy relocation record that instructs the runtime linker to copy the data from the shared object to the allocated space within the dynamic executable.
Because the symbol assigned to this space is global, it is used to satisfy any references from any shared objects. The dynamic executable inherits the data item. Any other objects within the process that make reference to this item are bound to this copy. The original data from which the copy is made effectively becomes unused.
You can observe this using readelf -Ws main:
Symbol table '.dynsym' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
...
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND inci
4: 0000000000404024 4 OBJECT GLOBAL DEFAULT 25 i
Note that the inci() is undefined (it's defined in libother.so), but i is defined in the main as a global symbol, and readelf -Wr main:
Relocation section '.rela.dyn' at offset 0x4d8 contains 3 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
...
0000000000404024 0000000400000005 R_X86_64_COPY 0000000000404024 i + 0
Relocation section '.rela.plt' at offset 0x520 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 inci + 0

Call libc function from assembly

I have a function defined in assembly that is calling a libc function (swapcontext). I invoke that function from my C code. For the purpose of creating a reproducible example, I'm using 'puts' instead:
foo.S:
.globl foo
foo:
call puts
ret
test.c:
void foo(char *str);
int main() {
foo("Hello World\n");
return 0;
}
Compile:
gcc test.c foo.S -o test
This compiles fine. Dis-assembling the result binary however shows that a valid call instruction wasn't inserted by the linker:
objdump -dR:
0000000000000671 <foo>:
671: e8 00 00 00 00 callq 676 <foo+0x5>
672: R_X86_64_PC32 puts#GLIBC_2.2.5-0x4
676: c3 retq
677: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
67e: 00 00
0000000000000530 <puts#plt>:
530: ff 25 9a 0a 20 00 jmpq *0x200a9a(%rip) # 200fd0 <puts#GLIBC_2.2.5>
536: 68 00 00 00 00 pushq $0x0
53b: e9 e0 ff ff ff jmpq 520 <.plt>
Execution:
./test1: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
Any ideas why?

For your updated totally separate question, which replaced your question about disassembling a .o:
semi-related: Unexpected value of a function pointer local variable mentions the fact that the linker transforms references to puts to puts#plt for you in a non-PIE (because that lets you get efficient code if statically linking), but not in a PIE.
libc gets mapped more than 2GiB away from the main executable so a call rel32 can't reach it.
See also Can't call C standard library function on 64-bit Linux from assembly (yasm) code, which shows AT&T and NASM syntax for calling libc functions from a PIE executable, either via the PLT call puts#plt or gcc -fno-plt style with call *puts#gotpcrel(%rip).

You appear to be disassembling an object file with relocations.
Relocations are stubs for the linker to resolve when the file is loaded.
To properly view the relocations and symbol names, use objdump -dr test or objdump -dR test.
The output will be similar to this:
0000000000000000 <foo>:
0: e8 00 00 00 00 callq 5 <foo+0x5>
1: R_X86_64_PLT32 swapcontext-0x4
You may also consider adding a ret instruction at the end of foo, just in case swapcontext errors.
As shown by your objdump -dR output, both of these refer to libc functions:
670: R_X86_64_PC32 swapcontext#GLIBC_2.2.5-0x4
# 200fd0 <swapcontext#GLIBC_2.2.5>

Why static string in .rodata section has a four dots prefix in GCC?

For the following code:
#include <stdio.h>
int main() {
printf("Hello World");
printf("Hello World1");
return 0;
}
the generated assembly for calling printf is as follows (64 bits):
400474: be 24 06 40 00 mov esi,0x400624
400479: bf 01 00 00 00 mov edi,0x1
40047e: 31 c0 xor eax,eax
400480: e8 db ff ff ff call 400460 <__printf_chk#plt>
400485: be 30 06 40 00 mov esi,0x400630
40048a: bf 01 00 00 00 mov edi,0x1
40048f: 31 c0 xor eax,eax
400491: e8 ca ff ff ff call 400460 <__printf_chk#plt>
And the .rodata section is as follows:
Contents of section .rodata:
400620 01000200 48656c6c 6f20576f 726c6400 ....Hello World.
400630 48656c6c 6f20576f 726c6431 00 Hello World1.
Based on the assembly code, the first call for printf has the argument with address 400624 which has a 4 byte offset from the start of .rodata. I know it skips the first 4 bytes for these 4 dots prefix here. But my question is why GCC/linker produce this prefix for string in .rodata ? I am using 4.8.4 GCC on Ubuntu 14.04. The compilation cmd is just: gcc -Ofast my-source.c -o my-program.

For starters, those are not four dots, the dot just means unprintable character. You can see in the hex dump that those bytes are 01 00 02 00.
The final program contains other object files added by the linker, which are part of the C runtime library. This data is used by code there.
You can see the address is 0x400620. You can then try to find a matching symbol, for example you can load it into gdb and use the info symbol command:
(gdb) info symbol 0x4005f8
_IO_stdin_used in section .rodata of /tmp/a.out
(Note I had a different address.)
Taking it further, you can actually find the source for this in glibc:
/* This records which stdio is linked against in the application. */
const int _IO_stdin_used = _G_IO_IO_FILE_VERSION;
and
#define _G_IO_IO_FILE_VERSION 0x20001
Which corresponds to the value you see if you account for little-endian storage.

It does not prefix the data. The .rodata can contain anything. The first four bytes are [seemingly] a string, but it just happens to link there (i.e. it's for something else). It is unrelated to your "Hello World"

Implement a similar module_init as Linux kernel, but meet some trouble in ld script

I like the linux kernel module_init function very much, I would like to implement the same function for my user space applications.
I try to modify the linker script to do this:
1, copy a x86-64 standard ld script
2, add my customized section
.module.init :
{
PROVIDE_HIDDEN (__module_init_start = .);
*(.module_init*)
PROVIDE_HIDDEN (__module_init_end = .);
}
3, put the init function pointer into moudle_init section
#define app_module_init(x) __initcall(x);
#define __initcall(fn) \
static initcall_t __initcall_##fn \
__attribute__ ((__section__(".module_init"))) = fn
app_module_init(unit_test_1_init);
app_module_init(unit_test_2_init);
app_module_init(unit_test_3_init);
app_module_init(unit_test_4_init);
4, compile the app with a customized linker script(based on the standard one)
gcc -o "./module_init" -T module.lds ./module_init.o
5, Then I objdump the moudle_init, I found the section is generated:
Disassembly of section .module_init:
0000000000a01080 <__initcall_unit_test_1_init>:
a01080: ad lods %ds:(%rsi),%eax
a01081: 05 40 00 00 00 add $0x40,%eax
...
0000000000a01088 <__initcall_unit_test_2_init>:
a01088: c2 05 40 retq $0x4005
a0108b: 00 00 add %al,(%rax)
a0108d: 00 00 add %al,(%rax)
...
0000000000a01090 <__initcall_unit_test_3_init>:
a01090: d7 xlat %ds:(%rbx)
a01091: 05 40 00 00 00 add $0x40,%eax
...
0000000000a01098 <__initcall_unit_test_4_init>:
a01098: ec in (%dx),%al
a01099: 05 40 00 00 00 add $0x40,%eax
But the __module_init_start and __module_init_end variable is not the value I expected. In my case __module_init_start is 0x4005ad and __module_init_end is 0x400000003.
This is very weird, because 0x4005ad is the address of __initcall_unit_test_1_init.
Anyone can give me an idea on how to make this user space module_init work?

The linker script can only set the addresses of variables. Use &__module_init_start to get a pointer to the start of the section, and &__module_init_end to get a pointer to the end.

How do I add address of struct to binary linker output

This might seem a strange question, but I'm generating a binary file and need to put some data in the header.
I'm using gcc and a fairly standard Cortex M4 bare-bones linker script.
Instead of putting the ISR vector first in the binary, I'm putting my own header. The binary will be copied to a pre-determined memory location (0x20008000, to which it has been linked) and run from there.
my startup.s contains this:
.section .isr_vector,"a",%progbits
.type g_pfnVectors, %object
.size g_pfnVectors, .-g_pfnVectors
g_pfnVectors:
.word 0xDADAC0DE /* magic number */
.word .isr_vector /* link base address */
.word Reset_Handler /* code entry point */
.word _end /* stack start */
.word _estack /* stack end */
.word ProgramVector /* pointer to shared memory block */
This all works fine, with the exception of ProgramVector. I define it in my main.c as follows:
typedef struct {
uint8_t checksum;
int16_t* audio_input;
int16_t* audio_output;
...
} SharedMemory;
SharedMemory ProgramVector;
I would expect the binary to include the address to ProgramVector. If I compare the output to what I see in the linker map file, everything matches (Reset_Handler, _end, _estack) but not ProgramVector.
My binary file app.bin :
00000000 de c0 da da 00 80 00 20 0d 9a 00 20 d8 ce 00 20 |....... ... ... |
00000010 00 c0 01 20 60 ce 00 20 60 ce 00 20 4f 57 4c 20 |... `.. `.. OWL |
00000020 50 72 6f 67 72 61 6d 00 10 b5 05 4c 23 78 33 b9 |Program....L#x3.|
From which we can determine that ProgramVector is 0x2000ce60, while my map files says:
0x2001c000 _estack = 0x2001c000
...
.bss 0x2000ae28 0x38 Build/main.o
0x2000ae28 ProgramVector
...
0x2000ced8 PROVIDE (_end, .)
Now you might say that ProgramVector is not a pointer, which is true enough. But I would expect the linker to output the address where it is placed. If I make it a SharedMemory* pointer, the output has the correct address of the pointer. The problem is I need the address of the struct, and I need it before the program has initialised.
I have tried this with a variety of compiler and linker flags, to no avail. Current compilation looks like this:
arm-none-eabi-gcc -Wl,--gc-sections -TSource/flash.ld -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -o Build/app.elf Build/sta./Build/libnosys_gnu.o -lm
arm-none-eabi-objcopy -O binary Build/solo.elf Build/solo.bin
I'm probably missing something obvious but going mad trying to figure out what. Any help, pointers or advice appreciated!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight