Linker Inserts Unnecessary Opcode Padding - c

I've recently come across a minor issue when linking multiple object files for a Motorola 68000 based system (SEGA Mega Drive). The problem is, when an input section for one object file ends and the next one begins, the linker fills memory addresses with zeros so that the next object file begins aligned on a four byte boundary. The text below is a memory map output by the linker. As you can see, the .text output section contains three object files. The first two (main.o, swap.o), were written in C compiled and assembled using m68k-elf-gcc. The third one (swap_asm.o) was hand written in 68000 assembly and assembled using the vasm. The function at the beginning of swap.o would normally start at address 0x0000001E. But, the linker is *fill*ing the beginning of the swap.o file with two bytes, specifically 0x0000. So, swap.o starts at 0x00000020. But, swap_asm.o is not getting aligned and begins at a non-four-byte-aligned address, 0x00000036. Is there a way to make the linker not add any padding and just start the swap.o right away? I understand there are a few work arounds like filling the space with a NOP, but I was wondering if there is a way to just not do a *fill*?
.text 0x00000000 0x4c
main.o(.text)
.text 0x00000000 0x1e main.o
0x00000000 main
swap.o(.text)
*fill* 0x0000001e 0x2
.text 0x00000020 0x16 swap.o
0x00000020 swap
swap_asm.o(.text)
.text 0x00000036 0x16 swap_asm.o
0x00000036 swap_asm

The 68000 processor requires instructions to be aligned (and this requirement holds also for data). Despite of the CPU requirements (which are unskipable) the linker also uses a script in which the segments are required to have some alignment (normally to provide for this cpu requirements)
While the linker script can be tweakable, It can be the case that changing the alignment makes the linker to produce incorrect code (because of what is said in the above paragraph) but anycase, that's something you can try and test.
Motorola 68000 (and more the 16 bit version of the MegaDrive) triggers a bus error trap when a 16bit transfer is requested on an odd address. The same happens if a 32bit (but this happens also up to the 68030, the 68040 I think already handles this making several bus accesses, like the Intel processors)

So I found my answer. When the assembler detects long (32-bits) data is being dealt with in an assembly file, it automatically aligns the input section along a 4 byte boundary. You can actually override this using SUBALIGN in a linker script. Here's my linker script aligning input sections along a 2 byte boundary.
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x00400000
}
SECTIONS
{
.text : SUBALIGN(0x2) {
*(.header)
*(.boot)
obj/main.o(.text)
*(.text)
*(.isr)
*(.vdp)
} > rom
.data : { *(.data) } > rom
.bss : { *(.bss) } > rom
}
New linker map:
.text 0x00000000 0x4a
main.o(.text)
.text 0x00000000 0x1e main.o
0x00000000 main
swap.o(.text)
.text 0x0000001e 0x14 swap.o
0x0000001e swap
swap_asm.o(.text)
.text 0x00000034 0x16 swap_asm.o
0x00000034 swap_asm

Related

.text section address range of position independent executable

I want the address of the .text section of a position independent executable. Using readelf -S:
Name Type Address Offset
Size EntSize Flags Link Info Align
.text PROGBITS 0000000000002700 00002700
0000000000001672 0000000000000000 AX 0 0 16
I learn that it will begin 0x2700 bytes past where library was loaded into memory. But how can I get the load address of the executable?
Is there any other way to get the .text section address range during runtime (from the running program)?
Is there any other way to get the .text section address range during runtime (from the running program)?
Yes: you need to use dl_iterate_phdr and use info->dlpi_addr to locate the PIE binary in memory at runtime. The very first call to your callback will be for the main executable.

Linux: How to find the physical address of an executable's .text, .data, etc

I'm working on a project and am not making the progress I would like.
My goal is to add code to, and then recompile, the Linux kernel to display the physical address of where the kernel places the various sections of an executable, such as the .text and .data sections, as well as the heap and the stack.
I understand how memory works, such as virtual vs. physical memory, and that it uses pages and lookup tables to translate between the two, but I can't find where in the actual code that it does this.
FWIW, I'm working with kernel version 2.6.32.67.
Currently, I'm poking around in exec.c and using printk() to print out data. For example, this is the output of the mm_struct that is contained in the struct linux_bprm *bprm struct (more specifically, in the mm_struct of the vm_area_struct.)
Execing: ./test.out
bprm
vm_start: 0x7ffe7ed9b000
vm_end: 0x7ffe7edb1000
vm size: 0x16000
bprm->vma->vm_mm:
total_vm: 0x3c
locked_vm: 0x0
shared_vm: 0x24
exec_vm: 0x21
stack_vm: 0x16
reserved_vm: 0x0
start_code: 0x400000
end_code: 0x400744
code size: 0x744
start_data: 0x600748
end_data: 0x600938
data size: 0x1f0
start_brk: 0xc66000
brk: 0xc66000
brk size: 0x0
start_stack: 0x7ffe7edaeac0
arg_start: 0x7ffe7edb04c1
arg_end: 0x7ffe7edb04c9
env_start: 0x7ffe7edb04c9
env_end: 0x7ffe7edb0ff0
From here, I am having trouble moving forward. Does anyone have any guidance? Thanks.

Moving memcpy into another code section

I am building a piece of software meant to run on an ARM Cortex-M0+ microcontroller. It includes a USB bootloader of sorts that runs as a secondary program upon a call to a function. I'm having an issue with the insertion of the memcpy function during compilation.
Background
The linker script is where it all starts. Most of it is pretty straightforward and standard. The program is stored in .text and is executed from there as well. Everything in .text is stored in the flash section of the chip.
The strangeness is the part where the bootloader runs. In order to be able to write all of the flash without overwriting the bootloader code, my bootloader entry point initiates a copy of the bootloader program into the SRAM portion of the microcontroller and then executes it from there. This way, the bootloader can safely erase all of the flash on the device without inadverently deleting itself.
This is implemented by doing an faked "overlay" in the linker script (the real OVERLAY didn't quite match my use case):
/**
* The bootloader and general ram live in the same area of memory
* NOTE: The bootloader gets its own special RAM space and it lives on top
* of both .data and .bss.
*/
_shared_start = .;
.bootloader _shared_start : AT(_end_flash)
{
/* We keep the bootloader and its data together */
_start_bootloader_flash = LOADADDR(.bootloader);
_start_bootloader = .;
*(.bootloader.data)
*(.bootloader.data.*)
. = ALIGN(1024); /* Interrupt vector tables must be aligned to a 1024-byte boundary */
*(.bootloader.interrupt_vector_table)
*(.bootloader)
_end_bootloader = .;
}
.data _shared_start : AT(_end_flash + SIZEOF(.bootloader))
{
_start_data_flash = LOADADDR(.data);
_start_data = .;
*(.data)
*(.data.*)
*(.shdata)
_end_data = .;
}
. = _shared_start + SIZEOF (.data);
_bootloader_size = _end_bootloader - _start_bootloader;
_data_size = _end_data - _start_data;
_end_flash is a reference to the end of the previous section which stored all of its data in flash (.text, .rodata, .init...basically anything read-only gets stuck there).
What this accomplishes is that the .data and .bss sections normally live in RAM. However, the .bootloader sections also live in the same place in RAM. Both sections are stored to the flash sequentially when compiled. In my crt0 routines, the .data section is copied from the flash into its appropriate address in RAM (specified by _start_data) and the .bss section is zeroed. I have an additional section stored in the .text section which initiates the bootloader by copying its data from the flash into RAM, overwriting whatever was in .data and .bss. The only exit from the bootloader is a system reset, so it is ok that it destroys the data for the running program. After copying the bootloader into RAM, it executes it.
The Question
Obviously, there are some possible issues with compiling an overlaid program and making sure all the references line up. In order to mitigate issues that would crop up accessing bootloader code from the normal program or accessing the normal .data or .bss from the bootloader, I have the following three lines in my linker script:
NOCROSSREFS(.bootloader .text);
NOCROSSREFS(.bootloader .data);
NOCROSSREFS(.bootloader .bss);
Now, whenever I have a cross between the .text (which might be erased by the bootloader), .data (which the bootloader lives on top of), or .bss (again, the bootloader lives on top of it) and the .bootloader section, a compiler error will be issued.
This worked great until I actually started writing code. Part of my code includes some struct copying and other such things. Apparently, the compiler decided to do this (bootloader_ functions live in the .bootloader section):
20000340 <bootloader_usb_endp0_handler>:
...
20000398: 1c11 adds r1, r2, #0
2000039a: 1c1a adds r2, r3, #0
2000039c: f000 f8e0 bl 20000560 <__memcpy_veneer>
...
20000560 <__memcpy_veneer>:
20000560: b401 push {r0}
20000562: 4802 ldr r0, [pc, #8] ; (2000056c <__memcpy_veneer+0xc>)
20000564: 4684 mov ip, r0
20000566: bc01 pop {r0}
20000568: 4760 bx ip
2000056a: bf00 nop
2000056c: 00000869 andeq r0, r0, r9, ror #16
In my chip's architecture, addresses 0x20000000 until 0xE000000 or so are located in SRAM (I only have 4Kb of that actually on the device). Any address below 0x1fffffc00 is located in the flash section.
The problem is this: In my function located in my .bootloader section (bootloader_usb_endp0_handler), a reference to memcpy (2000039c, 20000562, and 2000056c) was inserted because I'm doing a struct copy among other things. The reference it put to memcpy is at address 0x00000869, which lives in the flash...which could be erased.
The particular code is:
static setup_t last_setup;
last_setup = *((setup_t*)(bdt->addr));
Where setup_t is a two-word struct and bdt->addr is a void* which I know points to data that looks like a setup_t. This line generates the call to memcpy.
My question is: I'd really like to keep my struct copying. It is convenient. Is there any way to specify to the compiler to place the memcpy into a specific section other than the default? I want that to happen just for the bootloader module. All the other code can have it's memcpy...I just want a special copy for my bootloader module that lives inside .bootloader.
If this simply isn't possible, I'm going to either write the entire bootloader in assembly (not as fun) or go the route of compiling the bootloader separately, including it as a fairly long hexadecimal string in the end program, and executing the string after copying it to RAM. The string route doesn't appeal to me very well because it is breakable and difficult to implement...so any other suggestions would also be appreciated.
The compilation line for this module is:
arm-none-eabi-gcc -Wall -fno-common -mthumb -mcpu=cortex-m0plus -ffreestanding -fno-builtin -nodefaultlibs -nostdlib -O0 -c src/bootloader.c -o obj/bootloader.o
Normally the optimization would be -Os, but I was trying to get rid of the memcpy...it didn't work.
Also, I've looked at this question and it didn't fix the problem.
I never tried, but you might get away using the EXTERN() linker script directive to force load your newlib memcpy() twice - first in the bootloader link stage into your desired section and later undefining it and link it a second time into your "normal" code.

Unexpected linker section output location

I'm trying to use the ld command in linux on an assembly file for a kernel. For it to boot with grub, it needs to be after the 1Mb address. So my link script has the text going to the address 0x00100000.
Here's the linker script I'm using:
SECTIONS {
.text 0x00100000 :{
*(.text)
}
textEnd = .;
.data :{
*(.data)
*(.rodata)
}
dataEnd = .;
.bss :{
*(.common)
*(.bss)
}
bssEnd = .;
}
My question is about the output file. When I look at the binary of the file, text section starts at 0x1000. When I change the text location in the script and use addresses lower than 0x1000, such as 0x500, the text will start there. But whenever I go above 0x1000, it rounds it (0x2500 will put the text at 0x500).
When I specify that the text should be at 0x100000, shouldn't it be there in the output file? Or is there another part of the binary that specifies that there's more moving to do. I'm asking because there's a problem booting my kernel, but for now I'm just simply trying to understand the linker output.
You are referring to two different address spaces. The addresses you refer to within the linked file (such as 0x1000 and 0x500) are just the file offsets. The addresses specified in the linker script, such as 0x00100000, are with respect to computer memory (i.e. RAM).
In the case of the linker script, the linker is being told that the .text section of the binary/executable file should be loaded at the 1MiB point in RAM (i.e. 0x00100000). This has less to do with the layout of the file output by the linker and more to do with how the file is to be loaded when executed.
The section locations in the actual file have to do with alignment. That is, your linker appears to be aligning the first section at a 4096-byte boundary. If, for example, each section is less than 4096 bytes in size and each placed at 4096-byte boundary, their respective offsets in the file would be 0x1000, 0x2000, 0x3000, etc. By default, this alignment would also hold once the file is loaded into RAM such that the previous example would yield sections located at 0x00100000, 0x00101000, 0x00102000, etc.
And it appears that when you change the load location to a small enough number, the linker automatically changes the alignment. However, the 'ALIGN' function can be used if you wanted to manually specify the alignment.
For a short & sweet explanation of the linker (describing all of the above in more detail) I recommend:
http://www.math.utah.edu/docs/info/ld_3.html
or
http://sourceware.org/binutils/docs-2.15/ld/Scripts.html

Loading HEX data into memory

I am compiling baremetal software (no OS) for the Beagleboard (ARM Cortex A8) with Codesourcerys GCC arm EABI compiler. Now this compiles to a binary or image file that I can load up with the U-Boot bootloader.
The question is, Can I load hexdata into memory dynamically in runtime (So that I can load other image files into memory)? I can use gcc objcopy to generate a hexdump of the software. Could I use this information and load it into the appropriate address? Would all the addresses of the .text .data .bss sections be loaded correctly as stated in the linker script?
The hexdata output generated by
$(OBJCOPY) build/$(EXE).elf -O binary build/$(EXE).bin
od -t x4 build/$(EXE).bin > build/$(EXE).bin.hex
look like this:
0000000 e321f0d3 e3a00000 e59f1078 e59f2078
0000020 e4810004 e1510002 3afffffc e59f006c
0000040 e3c0001f e321f0d2 e1a0d000 e2400a01
0000060 e321f0d1 e1a0d000 e2400a01 e321f0d7
... and so on.
Is it as simple as to just load 20 bytes for each line into the desired memory address and everything would work by just branching the PC into the correct address? Did I forget something?
when you use -O binary you pretty much give up your .text, .data. .bss control. For example if you have one word 0x12345678 at address 0x10000000 call that .text, and one word of .data at 0x20000000, 0xAABBCCDD, and you use -O binary you will get a 0x10000004 byte length file which starts with the 0x12345678 and ends with 0xAABBCCDD and has 0x0FFFFFFC bytes of zeros. try to dump that into a chip and you might wipe out your bootloader (uboot, etc) or trash a bunch of registers, etc. not to mention dealing with potentially huge files and an eternity to transfer to the board depending on how you intend to do that.
What you can do which is typical with rom based bootloaders, is if using gcc tools
MEMORY
{
bob : ORIGIN = 0x10000000, LENGTH = 16K
ted : ORIGIN = 0x20000000, LENGTH = 16K
}
SECTIONS
{
.text : { *(.text*) } > bob
.bss : { *(.bss*) } > ted AT > bob
.data : { *(.data*) } > ted AT > bob
}
The code (.text) will be linked as if the .bss and .data are at their proper places in memory , 0x20000000, but the bytes are loaded by the executable (an elf loader or -O binary, etc) tacked onto the end of .text. Normally you use more linkerscript magic to determine where the linker did this. On boot, your .text code should first zero the .bss and copy the .data from the .text space to the .data space and then you can run normally.
uboot can probably handle formats other than .bin yes? It is also quite easy to write an elf tool that extracts the different parts of binaries and makes your own .bins, not using objcopy. It is also quite easy to write code that never relies on .bss being zero nor has a .data. solving all of these problems.
If you can write to random addresses without an OS getting in the way, there's no point in using some random hex dump format. Just load the binary data directly to the desired address. Converting on the fly from hex to binary to store in memory buys you nothing. You can load binary data to any address using plain read() or fread(), of course.
If you're loading full-blown ELF files (or similar), you of course need to implement whatever tasks that particular format expects from the object loader, such as allocating memory for BSS data, possibly resolving any unresolved addresses in the code (jumps and such), and so on.
Yes, it is possible to write to memory (on an embedded system) during run-time.
Many bootloaders copy data from a read-only memory (e.g. Flash), into writeable memory (SRAM) then transfer execution to that address.
I've worked on other systems that can download a program from a port (USB, SD Card) into writeable memory then transfer execution to that location.
I've written functions that download data from a serial port and programmed it into a Flash Memory (and EEPROM) device.
For memory to memory copies, use either memcpy or write your own, use pointers that are assigned a physical address.
For copying data from a port to memory, figure out how to get data from a device (such as a UART) then copy the data from its register into your desired location, via pointers.
Example:
#define UART_RECEIVE_REGISTER_ADDR (0x2000)
//...
volatile uint8_t * p_uart_receive_reg = (uint8_t*) UART_RECEIVE_REGISTER_ADDR;
*my_memory_location = *p_uart_receive_reg; // Read device and put into memory.
Also, search Stack Overflow for "embedded C write to memory"

Resources