In C (x86 linux ELF, gcc/clang), is it possible to communicate symbol information through linker use/abuse? For example, say I have the following setup:
// foo.c
void a_foo_function() {...}
void b_foo_function() {...}
// bar.c
void a_bar_function() {...}
void b_bar_function() {...}
void c_bar_function() {...}
// master.c
void *array_of_function_pointers;
int main() {
// do things with function_pointers
}
I would like array_of_function_pointers to be an array containing pointers to a_foo_function and a_bar_function. In this way, master.c could interact with functions defined in foo.c and bar.c without having to explicitly know about them. I recall seeing this done before by using custom sections (a la __attribute__((section("name"))), but I can't remember exactly what tricks were played.
From what I remember, the setup allowed master.c to stay unmodified, and any child could register some/all of it's functions via linker black magic, without having to write much, if any, boilerplate. Any gurus have some insight?
One way to achieve this is to place the individual function pointers into an orphan section (that is, a section which is not placed by the linker script):
Orphan Sections
And the final paragraph of Input Section Example
You can declare these symbols __start_NAME and __stop_NAME as pointers to pointers, and use them to iterate over the section contents (the pointers stored there).
This approach is used in glibc for various purposes. For example, this commit adds libio vtable verification to glibc. The special section is called __libc_IO_vtables, and the start and stop symbols are __start___libc_IO_vtables and __stop___libc_IO_vtables.
Related
Let's say I'm writing a library of functions, and each function makes use of a global array to perform its duties. I don't want to expose that array to non library code, so I declare it as static like so:
library.h:
void function1();
void function2();
library.c:
#include "library.h"
static int arr[ARBITRARY_SIZE];
void function1() {...} // both of these
void function2() {...} // make use arr
If I now want to use this library in my code, I would #include "library.c" at the top of my code.
If I understand correctly, #include simply copies and pastes in place the contents of the #includeed file. If this is the case, the user's code would itself contain the static definition of arr. Given that, how would I, as the author of the library, protect my library variables? If this is not the case, please correct me about what #include does!
static keyword doesn't protect the memory used by a variable, you can pass out of a function (with visibility of it) a reference to the variable so the variable is accessible out of the block where it is defined. Then the calling code can use that reference to modify it as desired.
static serves two purposes:
inside a block in a function body, it states that: despite the variable has visibility only in the inside of the block where it is defined, its life is all the program life (it is not created/destroyed when the program enters/exist the definition block)
outside a block, it gives local file visibility (the variable name is exposed nowhere out of the definition compilation unit). But that doesn't imply that there's no accessability to that global chunk of memory. You can, if you have a pointer reference pointing to it, still modify it as you want.
#include just text includes the include file contents verbatim in the compilation flow, so everything declared static in the include file has visibility in the including file (after the point of inclussion), and locally in every compilation unit that also includes the header file. But all definitions of it are different and independent, and they don't refer to the same variable (as they are local definitions in different compilation units), as it happens if you name two local variables of different blocks (even when nesting the blocks) with the same name, they are different variables.
If I now want to use this library in my code, I would #include "library.c" at the top of my code.
That will only work if you use this library in a single source file.
As soon as you add foo.c and bar.c which both #include "library.c" and try to link them together, you would get a multiply-defined function1 and function2 symbol error (because each of foo.o and bar.o will now provide their own separate definitions.
You could fix this by making the functions static as well: static void function1() { ... }, etc. but this not how people usually use libraries, because that method causes long compile times and larger than necessary executable. In addition, if you are using this method, you don't need the library.h file at all.
Instead, what people usually do is compile library.c into library.o, #include "library.h" at the top of their source files, then link everything together.
I don't want to expose that array to non library code, so I declare it as static like so:
That is a valid thing to do, and achieves your purpose (so long as you #include "library.h" and not library.c).
Note that using global arrays (as well as most other globals) makes code harder to reason about, and causes additional difficulties when making code thread-safe, and thus it's best to use globals very sparingly.
I'm developing an embedded project for ARM in C. I want to create a section in code in which I define some functions and then an init function will iterate over them to execute them. Something like:
void f1(void) { ... } __section("some_section")
void f2(void) { ... } __section("some_section")
...
...
[In a different module]
typedef void (*funptr)(void);
void init_func(void) {
funptr iter = &__some_section_start;
do {
*iter();
iter++;
} while (iter <= &__some_section_end);
}
I'm aware of something like this can be done by modifying the project linker script, but I want to know if it is possible to do so without linker scripts.
I have tried as well __attribute__((section("ARM.at.<address>"))) but that won't work because that would make one function to overlap the other.
In addition, I have tried to use a typical __attribute__((section(...))) and dissasembly the generated file, to see if the Keil compiler generates any sort of symbol to indicate the beginning and end of the section. Not only it doesn't do that, many times the variable I have created in a particular section is just not generated. I suspect the linker is striping it out, even using volatile and attribute(used) (which is supposed to do exactly this). How can I prevent the linker to do this?
In my case I am writing a simple plugin system in C using dlfcn.h (linux). The plugins are compiled separately from the main program and result in a bunch of .so files.
There are certain functions that must be defined in the plugin in order for the the plugin to be called properly by the main program. Ideally I would like each plugin to have included in it a .h file or something that somehow states what functions a valid plugin must have, if these functions are not defined in the plugin I would like the plugin to fail compilation.
I don't think you can enforce that a function be defined at compile time. However, if you use gcc toolchain, you can use the --undefined flag when linking to enforce that a symbol be defined.
ld --undefined foo
will treat foo as though it is an undefined symbol that must be defined for the linker to succeed.
You cannot do that.
It's common practice, to only define two exported functions in a library opened by dlopen(), one to import functions in your plugin and one to export functions of your plugin.
A few lines of code are better than any explanation:
struct plugin_import {
void (*draw)(float);
void (*update)(float);
};
struct plugin_export {
int (*get_version)(void);
void (*set_version)(int);
};
extern void import(struct plugin_import *);
extern void export(struct plugin_export *);
int setup(void)
{
struct plugin_export out = {0};
struct plugin_import in;
/* give the plugin our function pointers */
in.draw = &draw, in.update = &update;
import(&in);
/* get our functions out of the plugin */
export(&out);
/* verify that all functions are defined */
if (out.get_version == NULL || out.set_version == NULL)
return 1;
return 0;
}
This is very similar to the system Quake 2 used. You can look at the source here.
With the only difference, Quake 2 only exported a single function, which im- and exports the functions defined by the dynamic library at once.
Well after doing some research and asking a few people that I know of on IRC I have found the following solution:
Since I am using gcc I am able to use a linker script.
linker.script:
ASSERT(DEFINED(funcA), "must define funcA" ) ;
ASSERT(DEFINED(funcB), "must define funcB" ) ;
If either of those functions are not defined, then a custom error message will be output when the program tries to link.
(more info on linker script syntax can be found here: http://www.math.utah.edu/docs/info/ld_3.html)
When compiling simply add the linker script file after the source file:
gcc -o test main.c linker.script
Another possibility:
Something that I didn't think of (seems a bit obvious now) that was brought to my attention is you can create small program that loads your plugin and checks to see that you have valid function pointers to all of the functions that you want your plugin to have. Then incorporate this into your build system, be it a makefile or a script or whatever. This has the benefit that you are no longer limited to using a particular compiler to make this work. As well as you can do some more sophisticated checks for other other things. The only downside being you have a little more work to do to get it set up.
I understand that a function pointer points to the starting address of the code for a function. But is there any way to be able to point to the end of the code of a function as well?
Edit: Specifically on an embedded system with a single processor and no virtual memory. No optimisation too. A gcc compiler for our custom processor.
I wish to know the complete address range of my function.
If you put the function within its own special linker section, then your toolchain might provide a pointer to the end (and the beginning) of the linker section. For example, with Green Hills Software (GHS) MULTI compiler I believe you can do something like this:
#pragma ghs section text=".mysection"
void MyFunction(void) { }
#pragma ghs section
That will tell the linker to locate the code for MyFunction in .mysection. Then in your code you can declare the following pointers, which point to the beginning and end of the section. The GHS linker provides the definitions automatically.
extern char __ghsbegin_mysection[];
extern char __ghsend_mysection[];
I don't know whether GCC supports similar functionality.
You didn't say why you need this information, but on some embedded system it's required to copy a single function from flash to ram in order to (re)program the flash.
Normally you are placing this functions into a new unique section and depending of your linker you can copy this section with pure C or with assembler to the new (RAM) location.
You also need to tell the linker that the code will run from another address than that it is placed in flash.
In a project the flash.c could look like
#pragma define_section pram_code "pram_code.text" RWX
#pragma section pram_code begin
uint16_t flash_command(uint16_t cmd, uint16_t *addr, uint16_t *data, uint16_t cnt)
{
...
}
#pragma section pram_code end
The linker command file looks like
.prog_in_p_flash_ROM : AT(Fpflash_mirror) {
Fpram_start = .;
# OBJECT(Fflash_command,flash.c)
* (pram_code.text)
. = ALIGN(2);
# save data end and calculate data block size
Fpram_end = .;
Fpram_size = Fpram_end - Fpram_start;
} > .pRAM
But as others said, this is very toolchain specific
There is no way with C to point to the end of a function. A C compiler has a lot of latitude as to how it arranges the machine code it emits during compilation. With various optimization settings, a C compiler may actually merge machine code intermingling the machine code of the various functions.
Since along with what ever the C compiler does there is also what is done by the linker as well as the loader as a part of linking the various compiled pieces of object code together and then loading the application which may also be using various kinds of shared libraries.
In the complex running environment of modern operating systems and modern development tool chains, unless the language provides a specific mechanism for doing something, it is prudent to not try to get fancy leaving yourself open to an application which suddenly stops working due to changes in the operating environment.
In most cases if you use a non-optimizing setting of the compiler with static linked libraries, the symbol map that most linkers provide will give you a good idea as to where functions begin and end. However the only thing you can really depend on is knowing the address of the function entry points.
In some implementations (including gcc) you could do something like this (but its not guaranteed and lots of implementation details could affect it):
int foo() {
printf("testing\n");
return 7;
}
void foo_end() { }
int sizeof_foo() {
// assumes no code size optimizations across functions
// function could be smaller than reported
// reports size, not range
return (int (*)())foo_end - foo;
}
I have an interface with which I want to be able to statically link modules. For example, I want to be able to call all functions (albeit in seperate files) called FOO or that match a certain prototype, ultimately make a call into a function in the file without a header in the other files. Dont say that it is impossible since I found a hack that can do it, but I want a non hacked method. (The hack is to use nm to get functions and their prototypes then I can dynamically call the function). Also, I know you can do this with dynamic linking, however, I want to statically link the files. Any ideas?
Put a table of all functions into each translation unit:
struct functions MOD1FUNCS[]={
{"FOO", foo},
{"BAR", bar},
{0, 0}
};
Then put a table into the main program listing all these tables:
struct functions* ALLFUNCS[]={
MOD1FUNCS,
MOD2FUNCS,
0
};
Then, at run time, search through the tables, and lookup the corresponding function pointer.
This is somewhat common in writing test code. e.g., you want to call all functions that start with test_. So you have a shell script that grep's through all your .C files and pulls out the function names that match test_.*. Then that script generates a test.c file that contains a function that calls all the test functions.
e.g., generated program would look like:
int main() {
initTestCode();
testA();
testB();
testC();
}
Another way to do it would be to use some linker tricks. This is what the Linux kernel does for its initialization. Functions that are init code are marked with the qualifier __init. This is defined in linux/init.h as follows:
#define __init __section(.init.text) __cold notrace
This causes the linker to put that function in the section .init.text. The kernel will reclaim memory from that section after the system boots.
For calling the functions, each module will declare an initcall function with some other macros core_initcall(func), arch_initcall(func), et cetera (also defined in linux/init.h). These macros put a pointer to the function into a linker section called .initcall.
At boot-time, the kernel will "walk" through the .initcall section calling all of the pointers there. The code that walks through looks like this:
extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[];
static void __init do_initcalls(void)
{
initcall_t *fn;
for (fn = __early_initcall_end; fn < __initcall_end; fn++)
do_one_initcall(*fn);
/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();
}
The symbols __initcall_start, __initcall_end, etc. get defined in the linker script.
In general, the Linux kernel does some of the cleverest tricks with the GCC pre-processor, compiler and linker that are possible. It's always been a great reference for C tricks.
You really need static linking and, at the same time, to select all matching functions at runtime, right? Because the latter is a typical case for dynamic linking, i'd say.
You obviusly need some mechanism to register the available functions. Dynamic linking would provide just this.
I really don't think you can do it. C isn't exactly capable of late-binding or the sort of introspection you seem to be requiring.
Although I don't really understand your question. Do you want the features of dynamically linked libraries while statically linking? Because that doesn't make sense to me... to static link, you need to already have the binary in hand, which would make dynamic loading of functions a waste of time, even if you could easily do it.