mingw const char string is apparantly not const - c

I have a struct where a constant char string is defined and a pointer to my own string object. The goal is to declare a variable of this struct with the chars set and the txt set t NULL and than during runtime create the MyString object representing the chars. I can not create the MyString during compilation while I use the GLib and this lib first needs the g_type_init to be called.
struct _MyStaticString {
volatile MyString * txt;
const char *chars;
};
A declaration would then look like:
struct _MyStaticString my_test_string = { NULL, "Hello world or foo bar, or a rick roll" };
Then their is a function which is repsonsible for delivering me the MyString object by first checking if txt is NULL, if so create a new MyString object and return this MyString object.
struct _MyString *my_static_string(struct _MyStaticString *a_static) {
printf("a_static=%lx\n", (gulong) a_static);
printf("a_static.chars=%s\n", (char *) a_static->chars);
if (a_static->txt == NULL) {
CatString *result = g_object_new(MY_TYPE_STRING, NULL);
// result->data = (gchar *) a_static->chars;
result->data = strdup((char *) a_static->chars);
result->size = strlen((char *) a_static->chars);
result->hash = 0;
g_object_ref_sink(G_OBJECT(result));
result->parent.ref_count = 1000;
a_static->txt = result;
}
return (struct _MyString *) (a_static->txt);
}
This all works great and I'm so happy, at least when I'm running GCC on Linux. As soon as I start compiling this code on Windows with the help of the MinGW compiler things start going wrong. If I put everything in one project It's still fine but as soon as put the declaration in a .a library and use it elsewhere the field a_static->chars becomes NULL. So I started playing/tweaking/testing: I thought maybe it's the alignment of the data in the Object files and thus added #pragma pack(16). It didn't work. Than I thought maybe there is an attribute which can help me. So I added __attribute__ ((common)). It didn't work. I thought to be smart and separated the string from the structure declaration itself like:
const char helper_txt = "Hello world or foo bar, or a rick roll";
struct _MyStaticString my_test_string = { NULL, helper_txt };
I get compile errors:
error: initializer element is not constant
error: (near initialization for 'field.chars')
Here are my compiler flags
C:\MinGW\bin\gcc.exe
-IC:\MinGW\include
-IC:\GTK_ALL_IN_ONE\include\gtk-2.0
-IC:\GTK_ALL_IN_ONE\lib\gtk-2.0\include
-IC:\GTK_ALL_IN_ONE\include\atk-1.0
-IC:\GTK_ALL_IN_ONE\include\cairo
-IC:\GTK_ALL_IN_ONE\include\gdk-pixbuf-2.0
-IC:\GTK_ALL_IN_ONE\include\pango-1.0
-IC:\GTK_ALL_IN_ONE\include\glib-2.0
-IC:\GTK_ALL_IN_ONE\lib\glib-2.0\include
-IC:\GTK_ALL_IN_ONE\include
-IC:\GTK_ALL_IN_ONE\include\freetype2
-IC:\GTK_ALL_IN_ONE\include\libpng14
-IC:\work\workspace\module-blah\src
-O0 -g3 -Wall -c -fmessage-length=0 -mms-bitfields -DOSWINDOWS
and here is the version
C:\>c:\MinGW\bin\gcc.exe --version
gcc.exe (GCC) 4.5.0
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Do I miss some compiler flag or do I need to add an attribute to make sure the const char * string is exported into the .a lib and is their an easy way to check weather a string is in the .a lib? Or is it maybe some linker option ?

My guess is that you've still got what looks like a declaration of the variable that is turning up with NULL value in your project somewhere besides the library (.a file). GCC and the linker sometimes produce code that seems to follow your intentions rather than the strict letter of C when you do things like declare the same variable in more than one .c file (or in a .h file which is included in more than one .c file in the same project). What this should result in is more than one copy of the variable and possibly a linker error telling you that there is more than one object with the same name in your code, but for some reason this doesn't always happen when you are linking together lots of .o files that contain duplications of the same variable.
My guess here is that rather than:
extern struct _MyStaticString a_string;
in a header file you have:
struct _MyStaticString a_string;
and that you had what you consider the real declaration -- the one with the initialization -- in a .c file.
When you moved the real declaration to the library the linker's behavior changed when fulfilling the need for an a_string object. It already had one or more from the .o files from the main program so it didn't bother looking in the library for one. Previously it saw that it had several from the .o files and decided to go with the one that had been initialized to a non-zero or non-NULL value (the default for global or static variables). But without that initialized version of your variable around the linker has already decided to just go with one of the uninitialized versions of the variable before it even looks in the library for the value you wanted it to use.

Related

Why does global variable definition in C header file work? [duplicate]

This question already has answers here:
What happens if I define the same variable in each of two .c files without using "extern"?
(3 answers)
Closed 2 years ago.
From what I saw across many many stackoverflow questions among other places, the way to define globals is to define them in exactly one .c file, then declare it as an extern in a header file which then gets included in the required .c files.
However, today I saw in a codebase global variable definition in the header file and I got into arguing, but he insisted it will work. Now, I had no idea why, so I created a small project to test it out real quick:
a.c
#include <stdio.h>
#include "a.h"
int main()
{
p1.x = 5;
p1.x = 4;
com = 6;
change();
printf("p1 = %d, %d\ncom = %d\n", p1.x, p1.y, com);
return 0;
}
b.c
#include "a.h"
void change(void)
{
p1.x = 7;
p1.y = 9;
com = 1;
}
a.h
typedef struct coord{
int x;
int y;
} coord;
coord p1;
int com;
void change(void);
Makefile
all:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
gcc a.o b.o -o run.out
clean:
rm a.o b.o run.out
Output
p1 = 7, 9
com = 1
How is this working? Is this an artifact of the way I've set up the test? Is it that newer gcc has managed to catch this condition? Or is my interpretation of the whole thing completely wrong? Please help...
This relies on so called "common symbols" which are an extension to standard C's notion of tentative definitions (https://port70.net/~nsz/c/c11/n1570.html#6.9.2p2), except most UNIX linkers make it work across translation units too (and many even with shared dynamic libaries)
AFAIK, the feature has existed since pretty much forever and it had something to do with fortran compatibility/similarity.
It works by the compiler placing giving uninitialized (tentative) globals a special "common" category (shown in the nm utility as "C", which stands for "common").
Example of data symbol categories:
#!/bin/sh -eu
(
cat <<EOF
int common_symbol; //C
int zero_init_symbol = 0; //B
int data_init_symbol = 4; //D
const int const_symbol = 4; //R
EOF
) | gcc -xc - -c -o data_symbol_types.o
nm data_symbol_types.o
Output:
0000000000000004 C common_symbol
0000000000000000 R const_symbol
0000000000000000 D data_init_symbol
0000000000000000 B zero_init_symbol
Whenever a linker sees multiple redefinitions for a particular symbol, it usually generates linkers errors.
But when those redefinitions are in the common category, the linker will merge them into one.
Also, if there are N-1 common definitions for a particular symbol and one non-tentative definition (in the R,D, or B category), then all the definitions are merged into the one nontentative definition and also no error is generated.
In other cases you get symbol redefinition errors.
Although common symbols are widely supported, they aren't technically standard C and relying on them is theoretically undefined behavior (even though in practice it often works).
clang and tinycc, as far as I've noticed, do not generate common symbols (there you should get a redefinition error). On gcc, common symbol generation can be disabled with -fno-common.
(Ian Lance Taylor's serios on linkers has more info on common symbols and it also mentions how linkers even allow merging differently sized common symbols, using the largest size for the final object: https://www.airs.com/blog/archives/42 . I believe this weird trick was once used by libc's to some effect)
That program should not compile (well it should compile, but you'll have double definition errors in your linking phase) due to how the variables are defined in your header file.
A header file informs the compiler about external environment it normally cannog guess by itself, as external variables defined in other modules.
As your question deals with this, I'll try to explain the correct way to define a global variable in one module, and how to inform the compiler about it in other modules.
Let's say you have a module A.c with some variable defined in it:
A.c
int I_am_a_global_variable; /* you can even initialize it */
well, normally to make the compiler know when compiling other modules that you have that variable defined elsewhere, you need to say something like (the trick is in the extern keyword used to say that it is not defined here):
B.c
extern int I_am_a_global_variable; /* you cannot initialize it, as it is defined elsewhere */
As this is a property of the module A.c, we can write a A.h file, stating that somewhere else in the program, there's a variable named I_am_a_global_variable of type int, in order to be able to access it.
A.h
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and, instead of declaring it in B.c, we can include the file A.h in B.c to ensure that the variable is declared as the author of B.c wanted to.
So now B.c is:
B.c
#include "A.h"
void some_function() {
/* ... */
I_am_a_global_variable = /* some complicated expression */;
}
this ensures that if the author of B.c decides to change the type or the declaration of the variable, he can do changing the file A.h and all the files that #include it should be recompiled (you can do this in the Makefile for example)
A.c
#include "A.h" /* extern int I_am_a_global_variable; */
int I_am_a_global_variable = 27;
In order to prevent errors, it is good that A.c also #includes the file A.h, so the declaration
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and the final definition (that is included in A.c):
int I_am_a_global_variable = 23; /* I have initialized it to a non-default value to show how to do it */
are consistent between them (consider the author changes the type of I_am_a_global_variable to double and forgets to change the declaration in A.h, the compiler will complaint about non-matching declaration and definition, when compiling A.c (which now includes A.h).
Why I say that you will have double definition errors when linking?
Well, if you compile several modules with the statement (result of #includeing the file A.h in several modules) with the statement:
#include "A.h" /* this has an extern int I_am_a_global_variable; that informs the
* compiler that the variable is defined elsewhere, but see below */
int I_am_a_global_variable; /* here is _elsewhere_ :) */
then all those modules will have a global variable I_m_a_global_variable, initialized to 0, because the compiler defined it in every module (you don't say that the variable is defined elsewhere, you are stating it to declare and define it in this compilation unit) and when you link all the modules together you'll end with several definitions of a variable with the same name at several places, and the references from other modules using this variable will don't know which one is to be used.
The compiler doesn't know anything of other compilations for an application when it is compiling module A, so you need some means to tell it what is happening around. The same as you use function prototypes to indicate it that there's a function somewhere that takes some number of arguments of types A, B, C, etc. and returns a value of type Z, you need to tell it that there's a variable defined elsewhere that has type X, so all the accesses you do to it in this module will be compiled correctly.

How to merge statics for all instances of header-defined inline function?

Having a header that defines some static inline function that contains static variables in it, how to achieve merging of identical static local variables across all TUs that comprise final loadable module?. In a less abstract way:
/*
* inc.h
*/
#include <stdlib.h>
/*
* This function must be provided via header. No extra .c source
* is allowed for its definition.
*/
static inline void* getPtr() {
static void* p;
if (!p) {
p = malloc(16);
}
return p;
}
/*
* 1.c
*/
#include "inc.h"
void* foo1() {
return getPtr();
}
void* bar1() {
return getPtr();
}
/*
* 2.c
*/
#include "inc.h"
void* foo2() {
return getPtr();
}
void* bar2() {
return getPtr();
}
Platform is Linux, and this file set is built via:
$ clang -O2 -fPIC -shared 1.c 2.c
It is quite expected that both TUs receive own copies of getPtr.p. Though inside each TU getPtr.p is shared across all getPtr() instantiations. This can be confirmed by inspecting final loadable binary:
$ readelf -s --wide a.out | grep getPtr
32: 0000000000201030 8 OBJECT LOCAL DEFAULT 21 getPtr.p
34: 0000000000201038 8 OBJECT LOCAL DEFAULT 21 getPtr.p
At the same time I'm looking for a way of how to share getPtr.p across separate TU boundary. This vaguely resembles what happens with C++ template instantiations. And likely GRP_COMDAT would help me but I was not able to find any info about how to label my static var to be put into COMDAT.
Is there any attribute or other source-level (not a compiler option) way to achieve merging such objects?
If I understand correctly what you want, you can get this effect by simply declaring a global variable.
/*
* inc.h
*/
void* my_p;
static inline void* getPtr() {
if (!my_p) {
my_p = malloc(16);
}
return my_p;
}
This will use the same variable my_p for all instances of getPtr throughout the program (since it's global). And it is not necessary to have an explicit definition of my_p in any module. It will be initialized to NULL, which is just what you want. So nothing besides inc.h needs to change, and no additional .c file is needed.
Of course, you'll probably want to give my_p a name that is less likely to conflict with any identifier in the user's program. Maybe Sergios_include_file_p_for_getPtr or something of the sort.
This is actually an extension to standard C (mentioned in Annex J.5.11 in N2176), but it's provided by gcc and clang on most modern platforms. It's documented under the -fcommon compiler option (which is enabled by default). It's typically implemented by putting the variable in a common section, and the linker then merges all instances together, just as you suggest. But the code above shows how to access the feature without needing to use attributes or other obscure incantations.
If you want to be extra paranoid, you can declare my_p with __attribute__((common)) which will cause the variable to be treated in this way even if -fno-common is in effect. (Of course, that may cause trouble if -fno-common was being used for a reason...)

C string literal as parameter equals -1 in avr-gcc?

I am developing a software for AVR microcontroller. Saying in fromt, now I only have LEDs and pushbuttons to debug. The problem is that if I pass a string literal into the following function:
void test_char(const char *str) {
if (str[0] == -1)
LED_PORT ^= 1 << 7; /* Test */
}
Somewhere in main()
test_char("AAAAA");
And now the LED changes state. On my x86_64 machine I wrote the same function to compare (not LED, of course), but it turns out that str[0] equals to 'A'. Why is this happening?
Update:
Not sure whether this is related, but I have a struct called button, like this:
typedef struct {
int8_t seq[BTN_SEQ_COUNT]; /* The sequence of button */
int8_t seq_count; /* The number of buttons registered */
int8_t detected; /* The detected button */
uint8_t released; /* Whether the button is released
after a hold */
} button;
button btn = {
.seq = {-1, -1, -1},
.detected = -1,
.seq_count = 0,
.released = 0
};
But it turned out that btn.seq_count start out as -1 though I defined it as 0.
Update2
For the later problem, I solved by initializing the values in a function. However, that does not explain why seq_count was set to -1 in the previous case, nor does it explain why the character in string literal equals to -1.
Update3
Back to the original problem, I added a complete mini example here, and same occurs:
void LED_on() {
PORTA = 0x00;
}
void LED_off() {
PORTA = 0xFF;
}
void port_init() {
PORTA = 0xFF;
DDRA |= 0xFF;
}
void test_char(const char* str) {
if (str[0] == -1) {
LED_on();
}
}
void main() {
port_init();
test_char("AAAAA");
while(1) {
}
}
Update 4
I am trying to follow Nominal Animal's advice, but not quite successful. Here is the code I have changed:
void test_char(const char* str) {
switch(pgm_read_byte(str++)) {
case '\0': return;
case 'A': LED_on(); break;
case 'B': LED_off(); break;
}
}
void main() {
const char* test = "ABABA";
port_init();
test_char(test);
while(1) {
}
}
I am using gcc 4.6.4,
avr-gcc -v
Using built-in specs.
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/home/carl/Softwares/AVR/libexec/gcc/avr/4.6.4/lto-wrapper
Target: avr
Configured with: ../configure --prefix=/home/carl/Softwares/AVR --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 4.6.4 (GCC)
Rewritten from scratch, to hopefully clear up some of the confusion.
First, some important background:
AVR microcontrollers have separate address spaces for RAM and ROM/Flash ("program memory").
GCC generates code that assumes all data is always in RAM. (Older versions used to have special types, such as prog_char, that referred to data in the ROM address space, but newer versions of GCC do not and cannot support such data types.)
When linking against avr-libc, the linker adds code (__do_copy_data) to copy all initialized data from program memory to RAM. If you have both avr-gcc and avr-libc packages installed, and you use something like avr-gcc -Wall -O2 -fomit-frame-pointer -mmcu=AVRTYPE source.c -o binary.elf to compile your source file into a program binary, then use avr-objcopy to convert the elf file into the format your firmware utilities support, you are linking against avr-libc.
If you use avr-gcc to only produce an object file source.o, and some other utilities to link and upload your program to your microcontroller, this copying from program memory to RAM may not happen. It depends on what linker and libraries your use.
As most AVRs have only a few dozen to few hundred bytes of RAM available, it is very, very easy to run out of RAM. I'm not certain if avr-gcc and avr-libc reliably detect when you have more initialized data than you have RAM available. If you specify any arrays containing strings, it is very likely you're already overrun your RAM, causing all sorts of interesting bugs to appear.
The avr/pgmspace.h header file is part of avr-libc, and defines a macro, PROGMEM, that can be used to specify data that will only be referred to by functions that take program memory addresses (pointers), such as pgm_read_byte() or strcmp_P() defined in the same header file. The linker will not copy such variables to RAM -- but neither will the compiler tell you if you're using them wrong.
If you use both avr-gcc and avr-libc, I recommend using the following approach for all read-only data:
#include <avr/pgmspace.h>
/*
* Define LED_init(), LED_on(), and LED_off() functions.
*/
void blinky(const char *str)
{
while (1) {
switch (pgm_read_byte(str++)) {
case '\0': return;
case 'A': LED_on(); break;
case 'B': LED_off(); break;
}
/* Add a sleep or delay here,
* or you won't be able to see the LED flicker. */
}
}
static const char example1[] PROGMEM = "AB";
const char example2[] PROGMEM = "AAAA";
int main(void)
{
static const char example3[] PROGMEM = "ABABB";
LED_init();
while (1) {
blinky(example1);
blinky(example2);
blinky(example3);
}
}
Because of changes (new limitations) in GCC internals, the PROGMEM attribute can only be used with a variable; if it refers to a type, it does nothing. Therefore, you need to specify strings as character arrays, using one of the forms above. (example1 is visible within this compilation unit only, example2 can be referred to from other compilation units too, and example3 is visible only in the function it is defined in. Here, visible refers to where you can refer to the variable; it has nothing to do with the contents.)
The PROGMEM attribute does not actually change the code GCC generates. All it does is put the contents to .progmem.data section, iff without it they'd be in .rodata. All of the magic is really in the linking, and in linked library code.
If you do not use avr-libc, then you need to be very specific with your const attributes, as they determine which section the contents will end up in. Mutable (non-const) data should end up in the .data section, while immutable (const) data ends up in .rodata section(s). Remember to read the specifiers from right to left, starting at the variable itself, separated by '*': the leftmost refers to the content, whereas the rightmost refers to the variable. In other words,
const char *s = p;
defines s so that the value of the variable can be changed, but the content it points to is immutable (unchangeable/const); whereas
char *const s = p;
defines s so that you cannot modify the variable itself, but you can the content -- the content s points to is mutable, modifiable. Furthermore,
const char *s = "literal";
defines s to point to a literal string (and you can modify s, ie. make it point to some other literal string for example), but you cannot modify the contents; and
char s[] = "string";
defines s to be a character array (of length 6; string length + 1 for end-of-string char), that happens to be initialized to { 's', 't', 'r', 'i', 'n', 'g', '\0' }.
All linker tools that work on object files use the sections to determine what to do with the contents. (Indeed, avr-libc copies the contents of .rodata sections to RAM, and only leaves .progmem.data in program memory.)
Carl Dong, there are several cases where you may observe weird behaviour, even reproducible weird behaviour. I'm no longer certain which one is the root cause of your problem, so I'll just list the ones I think are likely:
If linking against avr-libc, running out of RAM
AVRs have very little RAM, and copying even string literals to RAM easily eats it all up. If this happens, any kind of weird behaviour is possible.
Failing to linking against avr-libc
If you think you use avr-libc, but are not certain, then use avr-objdump -d binary.elf | grep -e '^[0-9a-f]* <_' to see if the ELF binary contains any library code. You should expect to see at least <__do_clear_bss>:, <_exit>:, and <__stop_program>: in that list, I believe.
Linking against some other C library, but expecting avr-libc behaviour
Other libraries you link against may have different rules. In particular, if they're designed to work with some other C compiler -- especially one that supports multiple address spaces, and therefore can deduce when to use ld and when lpm based on types --, it might be impossible to use avr-gcc with that library, even if all the tools talk to each other nicely.
Using a custom linker script and a freestanding environment (no C library at all)
Personally, I can live with immutable data (.rodata sections) being in program memory, with myself having to explicitly copy any immutable data to RAM whenever needed. This way I can use a simple microcontroller-specific linker script and GCC in freestanding mode (no C library at all used), and get complete control over the microcontroller. On the other hand, you lose all the nice predefined macros and functions avr-libc and other C libraries provide.
In this case, you need to understand the AVR architecture to have any hope of getting sensible results. You'll need to set up the interrupt vectors and all kinds of other stuff to get even a minimal do-nothing loop to actually run; personally, I read all the assembly code GCC produces (from my own C source) simply to see if it makes sense, and to try to make sure it all gets processed correctly.
Questions?
I faced a similar problem (inline strings were equal to 0xff,0xff,...) and solved it by just changing a line in my Makefile
from :
.out.hex:
$(OBJCOPY) -j .text \
-j .data \
-O $(HEXFORMAT) $< $#
to :
.out.hex:
$(OBJCOPY) -j .text \
-j .data \
-j .rodata \
-O $(HEXFORMAT) $< $#
or seems better :
.out.hex:
$(OBJCOPY) -R .fuse \
-R .lock \
-R .eeprom \
-O $(HEXFORMAT) $< $#
You can see full problem and answer here : https://www.avrfreaks.net/comment/2943846#comment-2943846

How can I cast an char array to a function pointer in C?

Is it possible to assign with cast to a function pointer a string or char array and then run it?
I have defined a few functions int f1();, int f2();, and so on
In the main() function I have read a string fct_name and declared a pointer to function int (*pv)();
I need to do something like this:
the fct_name can have values "f1" , "f2" and so on..
pv = (some sort of cast)fct_name;
pv();
My point is I want to avoid conditional instructions in favor of direct assignment (because I have a large number of functions in my program)
The code must obviously run.
Assuming you don't have an external library and are trying to call functions declared in your executable, you can do a lookup yourself
#define REGISTER_FUNC(name) {#name, name}
struct funclist
{
const char* name;
void (*fp)(void); //or some other signature
};
struct funclist AllFuncs[] = {
REGISTER_FUNC(f1),
REGISTER_FUNC(f2),
REGISTER_FUNC(f3),
{NULL,NULL} //LAST ITEM SENTINEL
};
Now you can lookup your variable fct_name in AllFuncs. You can use a linear search if the number is small, or insert them all into a hash table for O(1) lookup.
Alternately, if your names really are f1, f2, etc. you can just do
void (*FuncList)(void)[] = {NULL, f1,f2,f3};
...
int idx = atol(fct_name+1);
if (idx && idx < MAX_FUNCS)
FuncList[idx]();
A variant of Carey's answer, in case you're on a *nix system. dlopen() opens up your library. RTLD_LAZY tells the loader to not bother resolving all the library's symbols right away, and to wait for you to try to access them. dlsym() looks up the symbol in question.
Edit: Updated the snippet to better fit your clarification:
#include <dlfcn.h>
int main(int argc, char *argv[])
{
void *handle = dlopen("libexample.so", RTLD_LAZY);
if (handle == NULL) {
// error
}
char fct_name[64];
// read input from terminal here
void *func = dlsym(handle, fct_name);
if (func != NULL) {
// call function here; need to cast as appropriate type
}
}
libexample.so would be a library with your functions, compiled as a shared library, like so:
gcc -Wall -o libexample.so example.c -shared -fPIC
That being said, if you're going to the trouble of compiling a shared library like this, you'll probably just want to call the functions in your binary. You can do that if you link your library in at compile-time:
gcc -Wall -o test test.c -L. -lexample
-L. tells the linker to look for libraries in the current directory (.) and -lexample tells it to link with a library named "libexample.so". If you do this, you can just call the library functions directly within your program.
You can't cast a char array to a function just because the array happens to contain the name of a function. What you need to do is put your function(s) in a DLL and then do this:
HMODULE dll = LoadLibrary("foo.dll");
pv func = (pv)GetProcAddress(module, fct_name);

Data-only static libraries with GCC

How can I make static libraries with only binary data, that is without any object code, and make that data available to a C program? Here's the build process and simplified code I'm trying to make work:
./datafile:
abcdefghij
Makefile:
libdatafile.a:
ar [magic] datafile
main: libdatafile.a
gcc main.c libdatafile.a -o main
main.c:
#define TEXTPTR [more magic]
int main(){
char mystring[11];
memset(mystring, '\0', 11);
memcpy(TEXTPTR, mystring, 10);
puts(mystring);
puts(mystring);
return 0;
}
The output I'm expecting from running main is, of course:
abcdefghijabcdefghij
My question is: what should [magic] and [more magic] be?
You can convert a binary file to a .o file using objcopy; the generated file then defines symbols for the start address, end address and size of the binary data.
objcopy -I binary -O elf32-little data data.o
The data can be referenced from a program via
extern char const _binary_data_start[];
extern char const _binary_data_end[];
The data lives between those two pointers (note that declaring them as pointers does not work).
The "elf32-little" part needs to be adapted according to your target platform. There are many other options for fine control over the processing.
Put the data in global variables.
char const text[] = "abcdefghij";
Don't forget to declare text in a header. If the data is currently in a file, the FreeBSD file2c tool can convert it to C source code for you (manpage).

Resources