I am trying to insert a md5 hash of part of my binary into the binary, for keeping track of MCU FW version.
I have approached it like this:
in the link script I have split the flash in two sections
MEMORY
{
FLASH0 (rx) : ORIGIN = 0x8000000, LENGTH = 64K - 16
FLASH1 (r) : ORIGIN = 0x800FFF0, LENGTH = 16
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 8K
}
Then I have specified a output section like so:
.fw_version :
{
KEEP(*(.fw_version))
} >FLASH1
Next I have my firmware_version.c file containing only:
#define FW_VERSION_SIZE 16
const unsigned char FW_VERSION[FW_VERSION_SIZE]
__attribute__((section(".fw_version"), used)) = {0};
Then after the binary is compiled and objcopy has been used to create a .bin file I have a 65536 B large file, I split that file at 65520 bytes, do a md5 checksum of the first part and insert that into the second part (16 B). Lastly I do cat parta partb > final.bin.
When i examine this binary with hexdump I can see that the md5 checksum is indeed at the end.
Using objdump -h I get:
...
8 .fw_version 00000010 0800fff0 0800fff0 00017ff0 2**2
...
and objdump -t gives:
...
0800fff0 g O .fw_version 00000010 FW_VERSION
...
I thought that this meant that I could just use FW_VERSION[i] to get part i of the md5 checksum from within the mcu fw but when I examine the memory in gdb I get that it's all zeroed out like it was never changed.
What am I missing here?
[edit] the device is a stm32f030c8t6 arm cortex m0 programmed through gdb.
Like I commented under the question I found that the (one) reason for it not working was that while I was manipulating the .bin file while I loaded the .elf file when programming with gdb.
It should (could) have worked if I used a programmer or bootloader to download the .bin file to the target.
I found a better (I think) way of doing it though.
Compile all the sources in the project to .o files.
cat *.o > /tmp/tmp.something_unique. I used $(shell mktemp) in the Makefile
openssl dgst -md5 -binary /tmp/tmp.something_unique > version_file
objcopy -I binary -O elf32-littlearm -B arm version_file v_file.o
linkscript has a section .fw_version : { KEEP(v_file.o(.data)) } >FLASH1
link application
in application get the address of the version number by doing extern unsigned char _binary_version_file_start; uint8_t *FW_VERSION = &_binary_version_file_start; const size_t FW_VERSION_SIZE = (size_t) &_binary_version_file_size;. Note that the uses of & are correct.
This will result in the checksum being taken over all the objects that are compiled from source and then this checksum is linked into the binary that is flashed in the target.
Related
I am designing a risc-v processor and am using gcc to write some test programs for it.
I see these symbols in the elf file which don't seem to be really needed for the program execution, but I can't seem to be able to strip them away.
Here is my simple program:
// file: loop_c.c
int a = 0;
void _start() {
for (int i = 0; i < 10; ++i) {
a += 20;
}
}
I am compiling this into elf as follows:
riscv32-unknown-elf-gcc -static -nostdlib -T riscv32i.ld loop_c.c -o loop_c.elf
When I look into the hex contents of loop_c.elf, I see the following bits which I can't seem to be able to remove:
strip removes a few of them, but not all. I used the following command:
riscv32-unknown-elf-strip --strip-unneeded loop_c.elf
Is there any way to remove these bits completely and just set them to 0?
EDIT: Compiler version:
riscv32-unknown-elf-gcc (GCC) 11.1.0
EDIT2: Is there a name for the parts highlighted in the image above?
EDIT3: Okay, made some progress. The names of the sections that strip doesn't remove automatically are (in the second image above):
.shstrtab
.riscv.attributes
.sbss
I could remove the second and third ones with the following command:
riscv32-unknown-elf-strip -R .riscv.attributes loop_c.elf
riscv32-unknown-elf-strip -R .sbss loop_c.elf
But searching online, it seems that it's very hard or impossible to remove .shstrtab. I'm not sure why, but it seems that it's necessary for some reason.
EDIT4: My linker script. This is very bare bones for my CPU design. Obviously nothing close to what is used in the real world:
OUTPUT_FORMAT("elf32-littleriscv", "elf32-littleriscv", "elf32-littleriscv")
ENTRY(_start)
MEMORY
{
INST (rx) : ORIGIN = 0x1000, LENGTH = 0x1000 /* 4096 bytes or 1024 instructions max, 1 instruction = 4 bytes. */
DATA (rwx) : ORIGIN = 0x2000, LENGTH = 0x1000 /* 4096 bytes or 1024 words of data. 1 word = 4 bytes. */
}
SECTIONS
{
.text :
{
*(.text)
}> INST
.data :
{
*(.data)
}> DATA
}
I am doing a bit of OSdev, and I've been trying to implement memory management in my kernel. I have started off with a physical memory manager (this is a 32 bit OS). The idea is to keep a table of bits where we allocate a bit per 4K physical memory block. If the bit is '1', the block is in use and if '0', it isn't. I thought that this table should come after the kernel. So here is my kernel code (minimal):
#include<stdint.h>
#define PMMAP 0x1000 //This contains information from int 15h/E820
#define BLOCK_SIZE 4096
#define SECTOR_SIZE 512
#define SECTORS_PER_BLOCK 8
#define BLOCK_SIZE_B 12
#define SECTOR_SIZE_B 9
#define SECTORS_PER_BLOCK_B 3
void pmmngr_init(uint32_t kernelsize,uint32_t mapentrycount);
uint32_t* _physical_memory_table;
void kmain(uint32_t size,uint32_t mmapentrycount) //This size is passed on by the bootloader, where it has a filesystem driver-ish code that determines this. The size unit is 512 bytes
{
pmmngr_init(size,mmapentrycount);
return;
}
struct mmap_entry {
uint32_t startLo;
uint32_t startHi;
uint32_t sizeLo;
uint32_t sizeHi;
uint32_t type;
uint32_t acpi_3_0;
};
void pmmngr_init(uint32_t kernelsize,uint32_t mapentrycount)
{
struct mmap_entry* map_ptr= (struct mmap_entry*)PMMAP;
_physical_memory_table = (uint32_t*)(KERNEL_P + kernelsize*SECTOR_SIZE);
for (uint32_t i=0;i<0x8000;i++) //Why 0x8000? This is the size of the table (* 32 of course)
_physical_memory_table[i] = 0xffffffff;
}
Initally, I make everything 0xffffffff. Then I read the Memory Map (from E820) and allocate and deallocate (later).
I compile with:
i686-elf-gcc kernel.c -c -g -o kernel.o --ffreestanding
i686-elf-ld kernel.o -Ttext 0x100000 -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
Note that kernel is meant to be loaded at memory space 1M.
All this was the introduction to this problem.
Here is the main issue..
Here, my _physical_memory_table is loaded/created after the kernel, and where it is created depends upon the size of the kernel.bin file it got from the bootloader.
Suppose the size of the kernel.bin file is ~1K, The table would be placed at 1M + 1K in memory (0x100400). Here is the core problem.. the variable _physical_memory_table pointer isn't really 'placed' in the 0x100000 - 0x100400 range by the linker. It belongs to the .bss section and, in my case, is placed outside this range! The pointer is present in the location where the table is created, there is an overlap, and thus, it's a bug.
So how would I solve this problem? I need to expose the 'range of control' of the kernel, i.e, the range of the kernel and all its pieces in memory, and place this table after that.
So what do I do? (I'm guessing something with the linker script)
You will want to use a custom linker script for a kernel. In the script would would have the normal sections for .text, .rodata, .data, .bss and a few others. The custom is to define symbols for each section for the start and end for the current address in the linking process, e.g. _text_start = . and _text_end = . around the members in the .text section.
In your C code you can then declare variables:
extern void *_text_start[], *_text_end[];
Those will then have the addresses filled in by the linker and will tell you where each section of your kernel starts and ends. Often you also have a _end symbol that's after all sections. Usually that is identical to _bss_end, with .bss being the last section.
Your kernel would place _physical_memory_table after _end to avoid any overlaps with itself.
Although most people include a fixed initial _physical_memory_table in their .data or .bss sections that simply maps 4GB of memory 1:1. Once the MMU is up and running and you switched to kernel_start() a proper, fine grained memory table can be setup more easily with C code.
In my kernel I also include 64KB of unused memory in the .bss section that is used to prime the memory management. So right from the start there are 64KB of memory available for allocations. The code that parses the memory map can then allocate data structures from that pool before it adds free memory regions to the allocator.
I'm working on Linux and I've just heard that there was a command objcopy, I've found the relative command on my x86_64 PC: x86_64-linux-gnu-objcopy.
With its help, I can convert a file into an obj file: x86_64-linux-gnu-objcopy -I binary -O elf64-x86-64 custom.config custom.config.o
The file custom.config is a human-readable file. It contains two lines:
name titi
password 123
Now I can execute objdump -x -s custom.config.o to check its information.
custom.config.o: file format elf64-little
custom.config.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000017 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_custom_config_start
0000000000000017 g .data 0000000000000000 _binary_custom_config_end
0000000000000017 g *ABS* 0000000000000000 _binary_custom_config_size
Contents of section .data:
0000 6e616d65 20746974 690a7061 7373776f name titi.passwo
0010 72642031 32330a rd 123.
As all we know, we can open, read or write a file, such as custom.config in any C/C++ project. Now, I'm thinking if it's possible to use this obj file custom.config.o immediately in a C/C++ project. For example, is it possible to read the content of the file custom.config.o immediately without calling the I/O functions, such as open, read or write. If possible, I think this might become some kind of hardcoding style and avoid calling the I/O functions?
Even if I tried this on Win10 with MinGW (MinGW-W64 project, GCC 8.1.0), this should work for you with only minor adaptions.
As you see from the info objdump gave you, the file's contents is placed in the .data section that is the common section for non-constant variables.
And some symbols were defined for it. You can declare these symbols in your C source.
The absolute value _binary_custom_config_size is special, because it is marked *ABS*. Currently I know no other way to obtain its value than to declare a variable of any type and take its address.
This is my show_config.c:
#include <stdio.h>
#include <string.h>
extern const char _binary_custom_config_start[];
extern const char _binary_custom_config_size;
int main(void) {
size_t size = (size_t)&_binary_custom_config_size;
char config[size + 1];
strncpy(config, _binary_custom_config_start, size);
config[size] = '\0';
printf("config = \"%s\"\n", config);
return 0;
}
Because the "binary" file (actually a text) has no final '\0' character, you need to append one to get a correctly terminated C string.
You could as well declare _binary_custom_config_end and use it to calculate the size, or as a limit.
Building everything goes like this (I used the -g option to debug):
$ objcopy -I binary -O elf64-x86-64 -B i386 custom.config custom.config.o
$ gcc -Wall -Wextra -pedantic -g show_config.c custom.config.o -o show_config
And the output shows the success:
$ show_config.exe
config = "name titi
password 123"
If you need the file's contents in another section, you will add the option to rename the section to objcopy's call. Add any flag you need, the example shows .rodata that is used for read-only data:
--rename-section .data=.rodata,alloc,load,readonly,data,contents
I have a NASM assembly file that I am assembling and linking (on Intel-64 Linux).
There is a text file, and I want the contents of the text file to appear in the resulting binary (as a string, basically). The binary is an ELF executable.
My plan is to create a new readonly data section in the ELF file (equivalent to the conventional .rodata section).
Ideally, there would be a tool to add a file verbatim as a new section in an elf file, or a linker option to include a file verbatim.
Is this possible?
This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program.
OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name.
Assume we have an input file called myfile.txt with the contents:
the
quick
brown
fox
jumps
over
the
lazy
dog
Something like this would be a starting point:
objcopy --input binary \
--output elf64-x86-64 \
--binary-architecture i386:x86-64 \
myfile.txt myfile.o
If you wanted to generate 32-bit objects you could use:
objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 \
myfile.txt myfile.o
The output would be an object file called myfile.o . If we were to review the headers of the object file using OBJDUMP and a command like objdump -x myfile.o we would see something like this:
myfile.o: file format elf64-x86-64
myfile.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_myfile_txt_start
000000000000002c g .data 0000000000000000 _binary_myfile_txt_end
000000000000002c g *ABS* 0000000000000000 _binary_myfile_txt_size
By default it creates a .data section with contents of the file and it creates a number of symbols that can be used to reference the data.
_binary_myfile_txt_start
_binary_myfile_txt_end
_binary_myfile_txt_size
This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file myfile.txt. OBJCOPY will base the symbols on the input file name. myfile.txt is mangled into myfile_txt and used to create the symbols.
One problem is that a .data section is created which is read/write/data as seen here:
Idx Name Size VMA LMA File off Algn
0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
You specifically are requesting a .rodata section that would also have the READONLY flag specified. You can use the --rename-section option to change .data to .rodata and specify the needed flags. You could add this to the command line:
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA
Of course if you want to call the section something other than .rodata with the same flags as a read only section you can change .rodata in the line above to the name you want to use for the section.
The final version of the command that should generate the type of object you want is:
objcopy --input binary \
--output elf64-x86-64 \
--binary-architecture i386:x86-64 \
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \
myfile.txt myfile.o
Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki:
A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss.
The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced)
Keeping this in mind we could create this C file called main.c:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
/* These are external references to the symbols created by OBJCOPY */
extern char _binary_myfile_txt_start[];
extern char _binary_myfile_txt_end[];
extern char _binary_myfile_txt_size[];
int main()
{
char *data_start = _binary_myfile_txt_start;
char *data_end = _binary_myfile_txt_end;
size_t data_size = (size_t)_binary_myfile_txt_size;
/* Print out the pointers and size */
printf ("data_start %p\n", data_start);
printf ("data_end %p\n", data_end);
printf ("data_size %zu\n", data_size);
/* Print out each byte until we reach the end */
while (data_start < data_end)
printf ("%c", *data_start++);
return 0;
}
You can compile and link with:
gcc -O3 main.c myfile.o
The output should look something like:
data_start 0x4006a2
data_end 0x4006ce
data_size 44
the
quick
brown
fox
jumps
over
the
lazy
dog
A NASM example of usage is similar in nature to the C code. The following assembly program called nmain.asm writes the same string to standard output using Linux x86-64 System Calls:
bits 64
global _start
extern _binary_myfile_txt_start
extern _binary_myfile_txt_end
extern _binary_myfile_txt_size
section .text
_start:
mov eax, 1 ; SYS_Write system call
mov edi, eax ; Standard output FD = 1
mov rsi, _binary_myfile_txt_start ; Address to start of string
mov rdx, _binary_myfile_txt_size ; Length of string
syscall
xor edi, edi ; Return value = 0
mov eax, 60 ; SYS_Exit system call
syscall
This can be assembled and linked with:
nasm -f elf64 -o nmain.o nmain.asm
gcc -m64 -nostdlib nmain.o myfile.o
The output should appear as:
the
quick
brown
fox
jumps
over
the
lazy
dog
Problem statement (using a contrived example):
Working as expected ('b' is printed to screen):
void Foo(const char* bar);
void main()
{
const char bar[4] = "bar";
Foo(bar);
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
Not working as expected (\0 is printed to screen):
void Foo(const char* bar);
void main()
{
Foo("bar");
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
In other words, if I pass the const char* directly, it doesn't pass correctly. The const char* I get in Foo points to zeroed out memory somehow. What am I doing wrong?
Background info (as requested):
I am developing an operating system for fun, using a guide I found here. The guide generally assumes you are on a unix-based machine, but I'm developing on a PC, so I'm using MinGW so that I have access to gcc, ld, etc.
In the guide, I am currently on page 54, where you have just bootstrapped your custom kernel. Rather than simply displaying an 'X' as the guide teaches, I decided to use my existing knowledge of C/C++ to attempt to write my own rudimentary print string function. The function is supposed to take a const char* and write it, char by char, into video memory.
Three files are currently involved in the project:
The boot sector - compiled through NASM to a .bin file
The kernel entry routine - compiled without linking through NASM to a .o, linked against the kernel
The kernel - compiled through gcc, linked along with the kernel entry routine through the ld command, which produces a .bin which is appended to the .bin file produced by the boot sector
Once the combined .bin file is generated, I am converting it to .VDI (VirtualBox Disk Image) and running it in a VM I have set up.
Additional info:
I just noticed that when VirtualBox is converting the .bin file to .vdi, it is reporting different sizes for the two examples. I had a hunch that maybe the string was getting omitted entirely from the compiled product. Sure enough, when I look at .bin for the first example in a hex editor, I can find the text "bar", but I can't when I look at a hex dump for the .bin of the second example.
This leads me to believe that the compilation process I'm using has a flaw in it somewhere. Here are the commands I'm using:
nasm boot_sector.asm -f bin -o boot_sector.bin
nasm kernel_entry.asm -f elf -o kernel_entry.o
gcc -ffreestanding -c kernel.c -o kernel.o
ld -T NUL -o kernel.tmp -Ttext 0x1000 kernel_entry.o kernel.o
objcopy -O binary -j .text kernel.tmp kernel.bin
copy /b boot_sector.bin+kernel.bin os_image.bin
os_image.bin is what is converted to the .vdi file which is used in the vm.
With your first example, the compiler will (or at least, can) put the data to initialize the automatic array right in the code (.text section - moves with immediate values are used when I try this out).
With your second example, the string literal is put in the .rodata section, and the code will contain a reference to that section.
Your objcopy command only copies the .text section, so the string will be missing in the final binary. You should add the .rodata section, or remove the -j .text entirely.