Why isn't my char* passing correctly? - c

Problem statement (using a contrived example):
Working as expected ('b' is printed to screen):
void Foo(const char* bar);
void main()
{
const char bar[4] = "bar";
Foo(bar);
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
Not working as expected (\0 is printed to screen):
void Foo(const char* bar);
void main()
{
Foo("bar");
}
void Foo(const char* bar)
{
// Pointer to first text cell of video memory
char* memory = (char*) 0xb8000;
*memory = bar[0];
}
In other words, if I pass the const char* directly, it doesn't pass correctly. The const char* I get in Foo points to zeroed out memory somehow. What am I doing wrong?
Background info (as requested):
I am developing an operating system for fun, using a guide I found here. The guide generally assumes you are on a unix-based machine, but I'm developing on a PC, so I'm using MinGW so that I have access to gcc, ld, etc.
In the guide, I am currently on page 54, where you have just bootstrapped your custom kernel. Rather than simply displaying an 'X' as the guide teaches, I decided to use my existing knowledge of C/C++ to attempt to write my own rudimentary print string function. The function is supposed to take a const char* and write it, char by char, into video memory.
Three files are currently involved in the project:
The boot sector - compiled through NASM to a .bin file
The kernel entry routine - compiled without linking through NASM to a .o, linked against the kernel
The kernel - compiled through gcc, linked along with the kernel entry routine through the ld command, which produces a .bin which is appended to the .bin file produced by the boot sector
Once the combined .bin file is generated, I am converting it to .VDI (VirtualBox Disk Image) and running it in a VM I have set up.
Additional info:
I just noticed that when VirtualBox is converting the .bin file to .vdi, it is reporting different sizes for the two examples. I had a hunch that maybe the string was getting omitted entirely from the compiled product. Sure enough, when I look at .bin for the first example in a hex editor, I can find the text "bar", but I can't when I look at a hex dump for the .bin of the second example.
This leads me to believe that the compilation process I'm using has a flaw in it somewhere. Here are the commands I'm using:
nasm boot_sector.asm -f bin -o boot_sector.bin
nasm kernel_entry.asm -f elf -o kernel_entry.o
gcc -ffreestanding -c kernel.c -o kernel.o
ld -T NUL -o kernel.tmp -Ttext 0x1000 kernel_entry.o kernel.o
objcopy -O binary -j .text kernel.tmp kernel.bin
copy /b boot_sector.bin+kernel.bin os_image.bin
os_image.bin is what is converted to the .vdi file which is used in the vm.

With your first example, the compiler will (or at least, can) put the data to initialize the automatic array right in the code (.text section - moves with immediate values are used when I try this out).
With your second example, the string literal is put in the .rodata section, and the code will contain a reference to that section.
Your objcopy command only copies the .text section, so the string will be missing in the final binary. You should add the .rodata section, or remove the -j .text entirely.

Related

Text file linked to ELF file - _binary_file_size information is garbage

I'm trying to revive some old code that links text files (.glsl etc.) into an executable. With my current computer & Kubuntu OS, compiling in 64 bits, I can't read size information anymore. I found a simple example that fails for me in the same way at How do I add contents of text file as a section in an ELF file? . It is further simplified below.
myfile.txt:
Annon edhellon, edro hi ammen
Fennas nogothrim, lasto beth lammen
Objectified with, as in the example,
objcopy --input binary --output elf64-x86-64 --binary-architecture i386:x86-64 --rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA myfile.txt myfile.o
I also tried ld -r -b binary -o myfile.o myfile.txt with the same result.
This is my main.c,
#include <stdlib.h>
#include <stdio.h>
/* These are external references to the symbols created by OBJCOPY */
extern char _binary_myfile_txt_start[];
extern char _binary_myfile_txt_end[];
extern char _binary_myfile_txt_size[];
int main() {
char *data_start = _binary_myfile_txt_start;
char *data_end = _binary_myfile_txt_end;
size_t data_size = (size_t)_binary_myfile_txt_size;
printf ("data_start %p\n", data_start);
printf ("data_end %p\n", data_end);
printf ("data_size %zu\n", data_size);
}
compiled with
gcc main.c myfile.o
When I run the code, the result is as follows:
data_start 0x55cd23b88032
data_end 0x55cd23b88074
data_size 94339555942466
The start and end pointers work, but data_size is nonsense. I'd expect it to be 66, as shown by wc. I've tried many obvious things but nothing seems to work.

Setting a constant in rodata

I am trying to understand how to set the value of a string in the rodata segment as loading it using an environment variable gives me issues.
I want to externally set a constant string in the rodata section. This function should be independent of the code executed. So, when I do
"objdump -c foo"
the rodata section must enlist this string without the file foo.c having to do it.
How do I set a constant in the .rodata section ?
Edit: Linux OS and using GCC
I cannot use an environment var as that would mean that the c code is modified, I want the c code untouched and add the constant, say "Goo" to the rodata segment.
Then you need to write a program that lets you modify the binary file.
Read the ELF file specifications.
Then write a program that modifies the ELF program and section headers and adds the data to the .rodata section.
I've managed to write a small bash script that does more or less what I think you want.
First let's consider this sample program:
test.c
#include <stdio.h>
const char message[1024] = "world";
int main()
{
printf("hello %s\n", message);
}
The target variable will be message. Note that I will not change the size of the variable, that would be a mess, you be careful to reserve as much memory as you will ever need.
Now the script:
patchsym
#!/bin/bash
# usage: patchsym PROGRAM SYMBOL < NEWCONTENT
EXE="$1"
SYMBOL="$2"
OFFS=$((0x$(objdump -t "$EXE" | grep " $SYMBOL$" | cut -d ' ' -f 1)))
OFFS=2176
dd of="$EXE" bs=1 seek=$OFFS conv=notrunc
The new message content will be:
newmsg
universe^#
(where ^# is actually a NUL character).
Now just do:
$ gcc test.c -o test
$ ./test
hello world
$ ./patchsym test message < newmsg
$ ./test
hello universe

C string literal as parameter equals -1 in avr-gcc?

I am developing a software for AVR microcontroller. Saying in fromt, now I only have LEDs and pushbuttons to debug. The problem is that if I pass a string literal into the following function:
void test_char(const char *str) {
if (str[0] == -1)
LED_PORT ^= 1 << 7; /* Test */
}
Somewhere in main()
test_char("AAAAA");
And now the LED changes state. On my x86_64 machine I wrote the same function to compare (not LED, of course), but it turns out that str[0] equals to 'A'. Why is this happening?
Update:
Not sure whether this is related, but I have a struct called button, like this:
typedef struct {
int8_t seq[BTN_SEQ_COUNT]; /* The sequence of button */
int8_t seq_count; /* The number of buttons registered */
int8_t detected; /* The detected button */
uint8_t released; /* Whether the button is released
after a hold */
} button;
button btn = {
.seq = {-1, -1, -1},
.detected = -1,
.seq_count = 0,
.released = 0
};
But it turned out that btn.seq_count start out as -1 though I defined it as 0.
Update2
For the later problem, I solved by initializing the values in a function. However, that does not explain why seq_count was set to -1 in the previous case, nor does it explain why the character in string literal equals to -1.
Update3
Back to the original problem, I added a complete mini example here, and same occurs:
void LED_on() {
PORTA = 0x00;
}
void LED_off() {
PORTA = 0xFF;
}
void port_init() {
PORTA = 0xFF;
DDRA |= 0xFF;
}
void test_char(const char* str) {
if (str[0] == -1) {
LED_on();
}
}
void main() {
port_init();
test_char("AAAAA");
while(1) {
}
}
Update 4
I am trying to follow Nominal Animal's advice, but not quite successful. Here is the code I have changed:
void test_char(const char* str) {
switch(pgm_read_byte(str++)) {
case '\0': return;
case 'A': LED_on(); break;
case 'B': LED_off(); break;
}
}
void main() {
const char* test = "ABABA";
port_init();
test_char(test);
while(1) {
}
}
I am using gcc 4.6.4,
avr-gcc -v
Using built-in specs.
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/home/carl/Softwares/AVR/libexec/gcc/avr/4.6.4/lto-wrapper
Target: avr
Configured with: ../configure --prefix=/home/carl/Softwares/AVR --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.6.4 (GCC)
Rewritten from scratch, to hopefully clear up some of the confusion.
First, some important background:
AVR microcontrollers have separate address spaces for RAM and ROM/Flash ("program memory").
GCC generates code that assumes all data is always in RAM. (Older versions used to have special types, such as prog_char, that referred to data in the ROM address space, but newer versions of GCC do not and cannot support such data types.)
When linking against avr-libc, the linker adds code (__do_copy_data) to copy all initialized data from program memory to RAM. If you have both avr-gcc and avr-libc packages installed, and you use something like avr-gcc -Wall -O2 -fomit-frame-pointer -mmcu=AVRTYPE source.c -o binary.elf to compile your source file into a program binary, then use avr-objcopy to convert the elf file into the format your firmware utilities support, you are linking against avr-libc.
If you use avr-gcc to only produce an object file source.o, and some other utilities to link and upload your program to your microcontroller, this copying from program memory to RAM may not happen. It depends on what linker and libraries your use.
As most AVRs have only a few dozen to few hundred bytes of RAM available, it is very, very easy to run out of RAM. I'm not certain if avr-gcc and avr-libc reliably detect when you have more initialized data than you have RAM available. If you specify any arrays containing strings, it is very likely you're already overrun your RAM, causing all sorts of interesting bugs to appear.
The avr/pgmspace.h header file is part of avr-libc, and defines a macro, PROGMEM, that can be used to specify data that will only be referred to by functions that take program memory addresses (pointers), such as pgm_read_byte() or strcmp_P() defined in the same header file. The linker will not copy such variables to RAM -- but neither will the compiler tell you if you're using them wrong.
If you use both avr-gcc and avr-libc, I recommend using the following approach for all read-only data:
#include <avr/pgmspace.h>
/*
* Define LED_init(), LED_on(), and LED_off() functions.
*/
void blinky(const char *str)
{
while (1) {
switch (pgm_read_byte(str++)) {
case '\0': return;
case 'A': LED_on(); break;
case 'B': LED_off(); break;
}
/* Add a sleep or delay here,
* or you won't be able to see the LED flicker. */
}
}
static const char example1[] PROGMEM = "AB";
const char example2[] PROGMEM = "AAAA";
int main(void)
{
static const char example3[] PROGMEM = "ABABB";
LED_init();
while (1) {
blinky(example1);
blinky(example2);
blinky(example3);
}
}
Because of changes (new limitations) in GCC internals, the PROGMEM attribute can only be used with a variable; if it refers to a type, it does nothing. Therefore, you need to specify strings as character arrays, using one of the forms above. (example1 is visible within this compilation unit only, example2 can be referred to from other compilation units too, and example3 is visible only in the function it is defined in. Here, visible refers to where you can refer to the variable; it has nothing to do with the contents.)
The PROGMEM attribute does not actually change the code GCC generates. All it does is put the contents to .progmem.data section, iff without it they'd be in .rodata. All of the magic is really in the linking, and in linked library code.
If you do not use avr-libc, then you need to be very specific with your const attributes, as they determine which section the contents will end up in. Mutable (non-const) data should end up in the .data section, while immutable (const) data ends up in .rodata section(s). Remember to read the specifiers from right to left, starting at the variable itself, separated by '*': the leftmost refers to the content, whereas the rightmost refers to the variable. In other words,
const char *s = p;
defines s so that the value of the variable can be changed, but the content it points to is immutable (unchangeable/const); whereas
char *const s = p;
defines s so that you cannot modify the variable itself, but you can the content -- the content s points to is mutable, modifiable. Furthermore,
const char *s = "literal";
defines s to point to a literal string (and you can modify s, ie. make it point to some other literal string for example), but you cannot modify the contents; and
char s[] = "string";
defines s to be a character array (of length 6; string length + 1 for end-of-string char), that happens to be initialized to { 's', 't', 'r', 'i', 'n', 'g', '\0' }.
All linker tools that work on object files use the sections to determine what to do with the contents. (Indeed, avr-libc copies the contents of .rodata sections to RAM, and only leaves .progmem.data in program memory.)
Carl Dong, there are several cases where you may observe weird behaviour, even reproducible weird behaviour. I'm no longer certain which one is the root cause of your problem, so I'll just list the ones I think are likely:
If linking against avr-libc, running out of RAM
AVRs have very little RAM, and copying even string literals to RAM easily eats it all up. If this happens, any kind of weird behaviour is possible.
Failing to linking against avr-libc
If you think you use avr-libc, but are not certain, then use avr-objdump -d binary.elf | grep -e '^[0-9a-f]* <_' to see if the ELF binary contains any library code. You should expect to see at least <__do_clear_bss>:, <_exit>:, and <__stop_program>: in that list, I believe.
Linking against some other C library, but expecting avr-libc behaviour
Other libraries you link against may have different rules. In particular, if they're designed to work with some other C compiler -- especially one that supports multiple address spaces, and therefore can deduce when to use ld and when lpm based on types --, it might be impossible to use avr-gcc with that library, even if all the tools talk to each other nicely.
Using a custom linker script and a freestanding environment (no C library at all)
Personally, I can live with immutable data (.rodata sections) being in program memory, with myself having to explicitly copy any immutable data to RAM whenever needed. This way I can use a simple microcontroller-specific linker script and GCC in freestanding mode (no C library at all used), and get complete control over the microcontroller. On the other hand, you lose all the nice predefined macros and functions avr-libc and other C libraries provide.
In this case, you need to understand the AVR architecture to have any hope of getting sensible results. You'll need to set up the interrupt vectors and all kinds of other stuff to get even a minimal do-nothing loop to actually run; personally, I read all the assembly code GCC produces (from my own C source) simply to see if it makes sense, and to try to make sure it all gets processed correctly.
Questions?
I faced a similar problem (inline strings were equal to 0xff,0xff,...) and solved it by just changing a line in my Makefile
from :
.out.hex:
$(OBJCOPY) -j .text \
-j .data \
-O $(HEXFORMAT) $< $#
to :
.out.hex:
$(OBJCOPY) -j .text \
-j .data \
-j .rodata \
-O $(HEXFORMAT) $< $#
or seems better :
.out.hex:
$(OBJCOPY) -R .fuse \
-R .lock \
-R .eeprom \
-O $(HEXFORMAT) $< $#
You can see full problem and answer here : https://www.avrfreaks.net/comment/2943846#comment-2943846

Data-only static libraries with GCC

How can I make static libraries with only binary data, that is without any object code, and make that data available to a C program? Here's the build process and simplified code I'm trying to make work:
./datafile:
abcdefghij
Makefile:
libdatafile.a:
ar [magic] datafile
main: libdatafile.a
gcc main.c libdatafile.a -o main
main.c:
#define TEXTPTR [more magic]
int main(){
char mystring[11];
memset(mystring, '\0', 11);
memcpy(TEXTPTR, mystring, 10);
puts(mystring);
puts(mystring);
return 0;
}
The output I'm expecting from running main is, of course:
abcdefghijabcdefghij
My question is: what should [magic] and [more magic] be?
You can convert a binary file to a .o file using objcopy; the generated file then defines symbols for the start address, end address and size of the binary data.
objcopy -I binary -O elf32-little data data.o
The data can be referenced from a program via
extern char const _binary_data_start[];
extern char const _binary_data_end[];
The data lives between those two pointers (note that declaring them as pointers does not work).
The "elf32-little" part needs to be adapted according to your target platform. There are many other options for fine control over the processing.
Put the data in global variables.
char const text[] = "abcdefghij";
Don't forget to declare text in a header. If the data is currently in a file, the FreeBSD file2c tool can convert it to C source code for you (manpage).

Embedding binary blobs using gcc mingw

I am trying to embed binary blobs into an exe file. I am using mingw gcc.
I make the object file like this:
ld -r -b binary -o binary.o input.txt
I then look objdump output to get the symbols:
objdump -x binary.o
And it gives symbols named:
_binary_input_txt_start
_binary_input_txt_end
_binary_input_txt_size
I then try and access them in my C program:
#include <stdlib.h>
#include <stdio.h>
extern char _binary_input_txt_start[];
int main (int argc, char *argv[])
{
char *p;
p = _binary_input_txt_start;
return 0;
}
Then I compile like this:
gcc -o test.exe test.c binary.o
But I always get:
undefined reference to _binary_input_txt_start
Does anyone know what I am doing wrong?
In your C program remove the leading underscore:
#include <stdlib.h>
#include <stdio.h>
extern char binary_input_txt_start[];
int main (int argc, char *argv[])
{
char *p;
p = binary_input_txt_start;
return 0;
}
C compilers often (always?) seem to prepend an underscore to extern names. I'm not entirely sure why that is - I assume that there's some truth to this wikipedia article's claim that
It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support
But it strikes me that if underscores were prepended to all externs, then you're not really partitioning the namespace very much. Anyway, that's a question for another day, and the fact is that the underscores do get added.
From ld man page:
--leading-underscore
--no-leading-underscore
For most targets default symbol-prefix is an underscore and is defined in target's description. By this option it is possible to disable/enable the default underscore symbol-prefix.
so
ld -r -b binary -o binary.o input.txt --leading-underscore
should be solution.
I tested it in Linux (Ubuntu 10.10).
Resouce file:
input.txt
gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 [generates ELF executable, for Linux]
Generates symbol _binary__input_txt_start.
Accepts symbol _binary__input_txt_start (with underline).
i586-mingw32msvc-gcc (GCC) 4.2.1-sjlj (mingw32-2) [generates PE executable, for Windows]
Generates symbol _binary__input_txt_start.
Accepts symbol binary__input_txt_start (without underline).
Apparently this feature is not present in OSX's ld, so you have to do it totally differently with a custom gcc flag that they added, and you can't reference the data directly, but must do some runtime initialization to get the address.
So it might be more portable to make yourself an assembler source file which includes the binary at build time, a la this answer.

Resources