How can I use C char arrays under real mode

How can I use C char arrays under real mode - c

I'm developing a bare-bone operating system in real mode. I have written my bootloader under assembly. But I want to write my kernel with C language. So far everything was nice but I notice that when I use char arrays (not pointers) for function parameters, they are not working. For example when I use:
puts("Hello, World!");
or,
char *ptr = "Hello, World!";
puts(ptr);
these are working. But when I do this:
char s[]="Hello, World!";
puts(s);
It's give me nothing. I want to know, why this happens and how can I use char arrays in this way?
Also this is my "puts" function:
void puts(char* s){
for(; *s!=0; s++)
asm volatile ("int $0x10": : "a"(0x0e00 | *s));
}
EDIT:
I'm compiling the using GCC with:
gcc -std=gnu99 -Os -nostdlib -m16 -march=i386 -ffreestanding -o kernel.bin -Wl,--nmagic,--script=raw_image.ld kernel.c
And this is the linker script:
OUTPUT_FORMAT(binary);
ENTRY(main);
SECTIONS
{
. = 0x0000;
.text :
{
*(.text);
}
.data :
{
*(.data);
*(.bss);
*(.rodata);
}
}
Also these are the web sites I used:
Building DOS COM files using GCC
Writing Bootloader in Assembly and C

You have to look at disassembler output for the resulting binary file. It might be that GCC targeting freestanding 386 did something unexpected with segments or whatever.
If that doesn't help much, you still have an option of using e.g. Bochs to run your OS there and use Bochs' integrated debugger to find out what actually happens when the code runs.

Related

Why does my shellcode testing program produce a segfault?

I'm trying to write a simple C program for testing if a given shellcode string works on my machine (64 bit), however every single attempt at running the below code results in a segmentation fault. Even though this "shellcode" is just some nop instructions and a break, can anybody explain what is going wrong? I've had similar experiences with shellcodes & shellcode testing programs written by other people, is there some recently introduced mitigation that I am not aware of? I am running: 5.9.0-kali1-amd64 #1 SMP Debian 5.9.1-1kali2 (2020-10-29) x86_64 GNU/Linux.
#include <stdlib.h>
#include <stdio.h>
#define CODE "\x90\x90\x90\x90\x90\x90\x90\xCC";
int main(int argc, char const *argv[])
{
int (*func)();
func = (int (*)()) CODE;
(int)(*func)();
}
This is the command/flags I use to compile the code.
gcc -fno-stack-protector -z execstack -no-pie -m64 -o shell shell.c

The 0xCC at the end is INT3 or a which should result in Trace/breakpoint trap
If you change 0xCC to 0xC3, it will return without faulting.
One possible mitigation would be if your compiler is putting constant strings into .rdata instead of .text .
Instead of:
#define CODE #define CODE "\x90\x90\x90\x90\x90\x90\x90\xCC";
try
__attribute__((section(".text")))
static const unsigned char code[] = "\x90\x90\x90\x90\x90\x90\x90\xCC";

Initialize array in gcc, undefined reference to `memcpy'

I'm coding C in Nachos3.4, Centos 6.0, compile by gcc 2.95.3,
the command line I use is gmake all
when I compile this, everything is fine
int main()
{
char* fname[] = {"c(0)", "c(1)", "c(2)", "c(3)", "c(4)", "c(5)", "c(6)", "c(7)"};
return 0;
}
but when I do this, it said undefined reference to 'memcpy'
int main()
{
char* fname[] = {"c(0)", "c(1)", "c(2)", "c(3)", "c(4)", "c(5)", "c(6)", "c(7)", "c(8)"};
return 0;
}
where is the problem and how can i fix that ?

Your initialisation of the automatic fname array involves the compiler constructing a copy of a large amount of data from a hidden static array to your array on the stack. GCC has several techniques it can use for this, one of it's favourites is to call the C library memcpy routine as this should be nice and quick whatever happens.
In your case you don't seem to have a C library so this is a problem.
You can tell GCC to always use the x86 instructions rather than calling the library like this:
gcc -mstringop-strategy=rep_byte -c -O file.c
or
gcc -mstringop-strategy=loop -c -O file.c
However, I was under the impression that GCC didn't start doing this till somewhere in the mid version 3.x.
Perhaps you're using a 'MIPS' processor, teachers like that processor, in which the required option would be -mno-memcpy.

Why isn't my char* passing correctly?

Problem statement (using a contrived example):
Working as expected ('b' is printed to screen):
void Foo(const char* bar);
void main()
{
const char bar[4] = "bar";
Foo(bar);
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
Not working as expected (\0 is printed to screen):
void Foo(const char* bar);
void main()
{
Foo("bar");
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
In other words, if I pass the const char* directly, it doesn't pass correctly. The const char* I get in Foo points to zeroed out memory somehow. What am I doing wrong?
Background info (as requested):
I am developing an operating system for fun, using a guide I found here. The guide generally assumes you are on a unix-based machine, but I'm developing on a PC, so I'm using MinGW so that I have access to gcc, ld, etc.
In the guide, I am currently on page 54, where you have just bootstrapped your custom kernel. Rather than simply displaying an 'X' as the guide teaches, I decided to use my existing knowledge of C/C++ to attempt to write my own rudimentary print string function. The function is supposed to take a const char* and write it, char by char, into video memory.
Three files are currently involved in the project:
The boot sector - compiled through NASM to a .bin file
The kernel entry routine - compiled without linking through NASM to a .o, linked against the kernel
The kernel - compiled through gcc, linked along with the kernel entry routine through the ld command, which produces a .bin which is appended to the .bin file produced by the boot sector
Once the combined .bin file is generated, I am converting it to .VDI (VirtualBox Disk Image) and running it in a VM I have set up.
Additional info:
I just noticed that when VirtualBox is converting the .bin file to .vdi, it is reporting different sizes for the two examples. I had a hunch that maybe the string was getting omitted entirely from the compiled product. Sure enough, when I look at .bin for the first example in a hex editor, I can find the text "bar", but I can't when I look at a hex dump for the .bin of the second example.
This leads me to believe that the compilation process I'm using has a flaw in it somewhere. Here are the commands I'm using:
nasm boot_sector.asm -f bin -o boot_sector.bin
nasm kernel_entry.asm -f elf -o kernel_entry.o
gcc -ffreestanding -c kernel.c -o kernel.o
ld -T NUL -o kernel.tmp -Ttext 0x1000 kernel_entry.o kernel.o
objcopy -O binary -j .text kernel.tmp kernel.bin
copy /b boot_sector.bin+kernel.bin os_image.bin
os_image.bin is what is converted to the .vdi file which is used in the vm.

With your first example, the compiler will (or at least, can) put the data to initialize the automatic array right in the code (.text section - moves with immediate values are used when I try this out).
With your second example, the string literal is put in the .rodata section, and the code will contain a reference to that section.
Your objcopy command only copies the .text section, so the string will be missing in the final binary. You should add the .rodata section, or remove the -j .text entirely.

How to place a variable at a given absolute address in memory (with GCC)

The RealView ARM C Compiler supports placing a variable at a given memory address using the variable attribute at(address):
int var __attribute__((at(0x40001000)));
var = 4; // changes the memory located at 0x40001000
Does GCC have a similar variable attribute?

I don't know, but you can easily create a workaround like this:
int *var = (int*)0x40001000;
*var = 4;
It's not exactly the same thing, but in most situations a perfect substitute. It will work with any compiler, not just GCC.
If you use GCC, I assume you also use GNU ld (although it is not a certainty, of course) and ld has support for placing variables wherever you want them.
I imagine letting the linker do that job is pretty common.
Inspired by answer by #rib, I'll add that if the absolute address is for some control register, I'd add volatile to the pointer definition. If it is just RAM, it doesn't matter.

You could use the section attributes and an ld linker script to define the desired address for that section. This is probably messier than your alternatives, but it is an option.

Minimal runnable linker script example
The technique was mentioned at: https://stackoverflow.com/a/4081574/895245 but now I will now provide a concrete example.
main.c
#include <stdio.h>
int myvar __attribute__((section(".mySection"))) = 0x9ABCDEF0;
int main(void) {
printf("adr %p\n", (void*)&myvar);
printf("val 0x%x\n", myvar);
myvar = 0;
printf("val 0x%x\n", myvar);
return 0;
}
link.ld
SECTIONS
{
.mySegment 0x12345678 : {KEEP(*(.mySection))}
}
GitHub upstream.
Compile and run:
gcc -fno-pie -no-pie -o main.out -std=c99 -Wall -Wextra -pedantic link.ld main.c
./main.out
Output:
adr 0x12345678
val 0x9abcdef0
val 0x0
So we see that it was put at the desired address.
I cannot find where this is documented in the GCC manual, but the following syntax:
gcc link.ld main.c
seems to append the given linker script to the default one that would be used.
-fno-pie -no-pie is required, because the Ubuntu toolchain is now configured to generate PIE executables by default, which leads the Linux kernel to place the executable on a different address every time, which messes with our experiment. See also: What is the -fPIE option for position-independent executables in gcc and ld?
TODO: compilation produces a warning:
/usr/bin/x86_64-linux-gnu-ld: warning: link.ld contains output sections; did you forget -T?
Am I doing something wrong? How to get rid of it? See also: How to remove warning: link.res contains output sections; did you forget -T?
Tested on Ubuntu 18.10, GCC 8.2.0.

You answered your question,
In your link above it states:
With the GNU GCC Compiler you may use only pointer definitions to access absolute memory locations. For example:
#define IOPIN0 (*((volatile unsigned long *) 0xE0028000))
IOPIN0 = 0x4;
Btw http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Variable-Attributes.html#Variable%20Attributes

Here is one solution that actually reserves space at a fixed address in memory without having to edit the linker file:
extern const uint8_t dev_serial[12];
asm(".equ dev_serial, 0x1FFFF7E8");
/* or asm("dev_serial = 0x1FFFF7E8"); */
...
for (i = 0 ; i < sizeof(dev_serial); i++)
printf((char *)"%02x ", dev_serial[i]);

In GCC you can place variable into specific section:
__attribute__((section (".foo"))) static uint8_t * _rxBuffer;
or
static uint8_t * _rxBuffer __attribute__((section (".foo")));
and then specify address of the section in GNU Linker Memory Settings:
.foo=0x800000

I had a similar issue. I wanted to allocate a variable in my defined section at a special offset. In the same time I wanted the code to be portable (no explicit memory address in my C code). So I defined the RAM section in the linker script, and defined an array with the same length of my section (.noinit section is 0x0F length).
uint8_t no_init_sec[0x0f] __attribute__ ((section (".noinit")));
This array maps all locations of this section. This solution is not suitable when the section is large as the unused locations in the allocated array will be a wasted space in the data memory.

The right answer to my opinion is the Minimal runnable linker script example one.
However, there was something not mentioned there:
If the variable is not used in code (e.g. the variable holds read-only data such as version...), it is necessary to add the 'used' attribute.
Refer to my answer at https://stackoverflow.com/a/75468786/3887115.

Embedding binary blobs using gcc mingw

I am trying to embed binary blobs into an exe file. I am using mingw gcc.
I make the object file like this:
ld -r -b binary -o binary.o input.txt
I then look objdump output to get the symbols:
objdump -x binary.o
And it gives symbols named:
_binary_input_txt_start
_binary_input_txt_end
_binary_input_txt_size
I then try and access them in my C program:
#include <stdlib.h>
#include <stdio.h>
extern char _binary_input_txt_start[];
int main (int argc, char *argv[])
{
char *p;
p = _binary_input_txt_start;
return 0;
}
Then I compile like this:
gcc -o test.exe test.c binary.o
But I always get:
undefined reference to _binary_input_txt_start
Does anyone know what I am doing wrong?

In your C program remove the leading underscore:
#include <stdlib.h>
#include <stdio.h>
extern char binary_input_txt_start[];
int main (int argc, char *argv[])
{
char *p;
p = binary_input_txt_start;
return 0;
}
C compilers often (always?) seem to prepend an underscore to extern names. I'm not entirely sure why that is - I assume that there's some truth to this wikipedia article's claim that
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
But it strikes me that if underscores were prepended to all externs, then you're not really partitioning the namespace very much. Anyway, that's a question for another day, and the fact is that the underscores do get added.

From ld man page:
--leading-underscore
--no-leading-underscore
For most targets default symbol-prefix is an underscore and is defined in target's description. By this option it is possible to disable/enable the default underscore symbol-prefix.
so
ld -r -b binary -o binary.o input.txt --leading-underscore
should be solution.

I tested it in Linux (Ubuntu 10.10).
Resouce file:
input.txt
gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 [generates ELF executable, for Linux]
Generates symbol _binary__input_txt_start.
Accepts symbol _binary__input_txt_start (with underline).
i586-mingw32msvc-gcc (GCC) 4.2.1-sjlj (mingw32-2) [generates PE executable, for Windows]
Generates symbol _binary__input_txt_start.
Accepts symbol binary__input_txt_start (without underline).

Apparently this feature is not present in OSX's ld, so you have to do it totally differently with a custom gcc flag that they added, and you can't reference the data directly, but must do some runtime initialization to get the address.
So it might be more portable to make yourself an assembler source file which includes the binary at build time, a la this answer.