I am trying to understand how to set the value of a string in the rodata segment as loading it using an environment variable gives me issues.
I want to externally set a constant string in the rodata section. This function should be independent of the code executed. So, when I do
"objdump -c foo"
the rodata section must enlist this string without the file foo.c having to do it.
How do I set a constant in the .rodata section ?
Edit: Linux OS and using GCC
I cannot use an environment var as that would mean that the c code is modified, I want the c code untouched and add the constant, say "Goo" to the rodata segment.
Then you need to write a program that lets you modify the binary file.
Read the ELF file specifications.
Then write a program that modifies the ELF program and section headers and adds the data to the .rodata section.
I've managed to write a small bash script that does more or less what I think you want.
First let's consider this sample program:
test.c
#include <stdio.h>
const char message[1024] = "world";
int main()
{
printf("hello %s\n", message);
}
The target variable will be message. Note that I will not change the size of the variable, that would be a mess, you be careful to reserve as much memory as you will ever need.
Now the script:
patchsym
#!/bin/bash
# usage: patchsym PROGRAM SYMBOL < NEWCONTENT
EXE="$1"
SYMBOL="$2"
OFFS=$((0x$(objdump -t "$EXE" | grep " $SYMBOL$" | cut -d ' ' -f 1)))
OFFS=2176
dd of="$EXE" bs=1 seek=$OFFS conv=notrunc
The new message content will be:
newmsg
universe^#
(where ^# is actually a NUL character).
Now just do:
$ gcc test.c -o test
$ ./test
hello world
$ ./patchsym test message < newmsg
$ ./test
hello universe
Related
Here is my code:
#include <stdio.h>
int variable;
int main(){
printf("%p", &variable);
}
Output in couple of runs:
~ % ./a.out
0x559bae5c4030
~ % ./a.out
0x55b9d1038030
~ %
as you can see, there's a "30" at the end of both addresses.
and the symbol table:
~ % readelf -s a.out | grep variable
Num: Value Size Type Bind Vis Ndx Name
51: 0000000000004030 4 OBJECT GLOBAL DEFAULT 23 variable
~ %
again there's this "30" at the end of Value field.
My question is, what exactly is that value field and what does it have to do with the output of code? and why the last two digits are preserved in every run?
sorry for my poor english
The Value field from readelf corresponds to the address of the variable in the executable a.out.
What you see in the output is the actual loaded address of variable at runtime. So your executable is loaded at (starting address) 0x559bae5c0000 in the first run ( = 0x559bae5c4030 - 0x4030). And is loaded at 0x55b9d1034000 in the second run (0x55b9d1038030 - 0x4030).
You can see this by inspecting /proc/<PID>/maps of the executable a.out when running.
The load address changes from run to run because of Address Space Layout Randomization on Linux.
I am exploring the Linux Kernel code and came across this line of code:
#define __init_task_data __attribute__((__section__(".data..init_task"))).
I know that something like:
int x __attribute__((__section__("section"))) = 10;
is an attribute of gcc which would put the symbol of x into the section "section" of the compiled process image. However when I try to specify ".data..init_task" as the section, my variable gets put into the .data section. Here is my code:
int apple __attribute__((__section__(".data..init_task"))) = 10;
Compiled with:
gcc test.c
Disassembled with:
objdump -D a.out
My variable "apple" appears under the .data section, there is no section ".data..init_task" which is what has stumped me.
I've been writing an OS using this tutorial. I am at the part where
the boot loader is completed and C is used for programming (and then linked together ...). But that just as a note, I believe the problem I have is related to gcc.
I build an i386-elf cross compiler for the OS. And everything works fine, I can execute my code everything works. Except that all global variables are initialized zero, although I provided a default value.
int test_var = 1234;
// yes, void main() is correct (the boot-loader will call this)
void main() {}
If I debug this code with GDB, I get: (gcc-7.1.0, target: i328-elf)
(gdb) b main
Breakpoint 1 at 0x1554: file src/kernel/main.c, line 11.
(gdb) c
Continuing.
Breakpoint 1, main () at src/kernel/main.c:11
11 void main() {
(gdb) p test_var
$1 = 0
If i run the same code on my local machine (gcc-6.3.0, target: x86_64), it prints 1234.
My question is: Did I misconfigure gcc, is this a mistake in my OS, is this a known problem? I couldn't find anything about it.
My entire source-code: link
I use the following commands to compile my stuff:
# ...
i386-elf-gcc -g -ffreestanding -Iinclude/ -c src/kernel/main.c -o out/kernel/main.o
# ...
i386-elf-ld -e 0x1000 -Ttext 0x1000 -o out/kernel.elf out/kernel_entry.o out/kernel/main.o # some other stuff ...
i386-elf-objcopy -O binary out/kernel.elf out/kernel.bin
cat out/boot.bin out/kernel.bin > out/os.bin
qemu-system-i386 -drive "format=raw,file=out/os.bin"
EDIT: As #EugeneSh. suggested here some logic to make sure, that it's not removed:
#include <cpu/types.h>
#include <cpu/isr.h>
#include <kernel/print.h>
#include <driver/vga.h>
int test_var = 1234;
void main() {
vga_text_init();
switch (test_var) {
case 1234: print("That's correct"); break;
case 0: print("It's zero"); break;
// I don't have a method like atoi() in place, I would use
// GDB to get the value
default: print("It's something else");
}
}
Sadly it prints It's zero
Compiler never clears uninitialized global variables to zero, its logic in built inside loader,
So when you allocate memory for data segment then it size contains bss section also. So you have to check bss section offset, alignment & size withing data segment and memset() them to '0'.
As you are writing your OS so may be all the library routines are not available so better write memset() function using assembly.
Problem statement (using a contrived example):
Working as expected ('b' is printed to screen):
void Foo(const char* bar);
void main()
{
const char bar[4] = "bar";
Foo(bar);
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
Not working as expected (\0 is printed to screen):
void Foo(const char* bar);
void main()
{
Foo("bar");
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
In other words, if I pass the const char* directly, it doesn't pass correctly. The const char* I get in Foo points to zeroed out memory somehow. What am I doing wrong?
Background info (as requested):
I am developing an operating system for fun, using a guide I found here. The guide generally assumes you are on a unix-based machine, but I'm developing on a PC, so I'm using MinGW so that I have access to gcc, ld, etc.
In the guide, I am currently on page 54, where you have just bootstrapped your custom kernel. Rather than simply displaying an 'X' as the guide teaches, I decided to use my existing knowledge of C/C++ to attempt to write my own rudimentary print string function. The function is supposed to take a const char* and write it, char by char, into video memory.
Three files are currently involved in the project:
The boot sector - compiled through NASM to a .bin file
The kernel entry routine - compiled without linking through NASM to a .o, linked against the kernel
The kernel - compiled through gcc, linked along with the kernel entry routine through the ld command, which produces a .bin which is appended to the .bin file produced by the boot sector
Once the combined .bin file is generated, I am converting it to .VDI (VirtualBox Disk Image) and running it in a VM I have set up.
Additional info:
I just noticed that when VirtualBox is converting the .bin file to .vdi, it is reporting different sizes for the two examples. I had a hunch that maybe the string was getting omitted entirely from the compiled product. Sure enough, when I look at .bin for the first example in a hex editor, I can find the text "bar", but I can't when I look at a hex dump for the .bin of the second example.
This leads me to believe that the compilation process I'm using has a flaw in it somewhere. Here are the commands I'm using:
nasm boot_sector.asm -f bin -o boot_sector.bin
nasm kernel_entry.asm -f elf -o kernel_entry.o
gcc -ffreestanding -c kernel.c -o kernel.o
ld -T NUL -o kernel.tmp -Ttext 0x1000 kernel_entry.o kernel.o
objcopy -O binary -j .text kernel.tmp kernel.bin
copy /b boot_sector.bin+kernel.bin os_image.bin
os_image.bin is what is converted to the .vdi file which is used in the vm.
With your first example, the compiler will (or at least, can) put the data to initialize the automatic array right in the code (.text section - moves with immediate values are used when I try this out).
With your second example, the string literal is put in the .rodata section, and the code will contain a reference to that section.
Your objcopy command only copies the .text section, so the string will be missing in the final binary. You should add the .rodata section, or remove the -j .text entirely.
How can I make static libraries with only binary data, that is without any object code, and make that data available to a C program? Here's the build process and simplified code I'm trying to make work:
./datafile:
abcdefghij
Makefile:
libdatafile.a:
ar [magic] datafile
main: libdatafile.a
gcc main.c libdatafile.a -o main
main.c:
#define TEXTPTR [more magic]
int main(){
char mystring[11];
memset(mystring, '\0', 11);
memcpy(TEXTPTR, mystring, 10);
puts(mystring);
puts(mystring);
return 0;
}
The output I'm expecting from running main is, of course:
abcdefghijabcdefghij
My question is: what should [magic] and [more magic] be?
You can convert a binary file to a .o file using objcopy; the generated file then defines symbols for the start address, end address and size of the binary data.
objcopy -I binary -O elf32-little data data.o
The data can be referenced from a program via
extern char const _binary_data_start[];
extern char const _binary_data_end[];
The data lives between those two pointers (note that declaring them as pointers does not work).
The "elf32-little" part needs to be adapted according to your target platform. There are many other options for fine control over the processing.
Put the data in global variables.
char const text[] = "abcdefghij";
Don't forget to declare text in a header. If the data is currently in a file, the FreeBSD file2c tool can convert it to C source code for you (manpage).