In C, we can force the linker to put a specific function in a specific section from the source code, using something like the following example.
Here, the function my_function is tagged with the preprocessor macro PUT_IN_USER_SECTION in order to tell the linker to put it in the section .user_section.
#define PUT_IN_USER_SECTION __attribute__((__section__(".user_section"))) __attribute__ ((noinline))
double PUT_IN_USER_SECTION my_function(double a, double b)
{
// Function content
}
Now, what I'd like to know is, when we use standard functions (for example log functions from the math.h library) from the GLIBC, MUSL, ... and we perform static linking: is it possible to put those functions in specific sections? and how to that?
There is a technique called overlaying that comes from a time when memory resources were limited. It is still used in Embedded Systems with restricted resources :
https://en.wikipedia.org/wiki/Overlay_(programming)
source : https://de.wikipedia.org/wiki/Overlay_(Programmierung)#/media/Datei:Overlay_Programming.svg
In https://developer.arm.com/documentation/100066/0611/mapping-code-and-data-to-target-memory/manual-overlay-support/writing-an-overlay-manager-for-manually-placed-overlays is an implementation of overlaying, the load addresses are in the scatter file.
https://sourceware.org/gdb/onlinedocs/gdb/How-Overlays-Work.html
For the addresses of functions in memory see https://ftp.gnu.org/pub/old-gnu/Manuals/ld-2.9.1/html_node/ld_22.html
https://www.keil.com/support/man/docs/armlink/armlink_pge1362066004071.htm
Related
Lately I have been making an attempt to read more open source C code. A common pattern that I have been adopting in my hobby projects is as follows.
In my C files, I have functions that are either static or exported. Only functions that are exported are placed in a header file. Global variables which are only used within the scope of an object are also used as static global variables.
My question concerns the usefulness and motivation of having static inline functions inside header files. From what I read online, not using the static keyword causes a multiple definition error and that is the reason for not just defining the function as just inline.
However, does this mean that this function is exported for other objects to use?
If yes, then why not just defining this function in the C file and export it via the header file?
If not, why putting this in the header file rather than just having it inside the C file?
Is there a reason behind this coding style? What am I missing?
One such example can be found in the git codebase inside hashmap.h:
/*
* Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
* for use in hash tables. Cryptographic hashes are supposed to have
* uniform distribution, so in contrast to `memhash()`, this just copies
* the first `sizeof(int)` bytes without shuffling any bits. Note that
* the results will be different on big-endian and little-endian
* platforms, so they should not be stored or transferred over the net.
*/
static inline unsigned int sha1hash(const unsigned char *sha1)
{
/*
* Equivalent to 'return *(unsigned int *)sha1;', but safe on
* platforms that don't support unaligned reads.
*/
unsigned int hash;
memcpy(&hash, sha1, sizeof(hash));
return hash;
}
A static inline function is, in practice, likely (but not certain) to be inlined by some good optimizing compiler (e.g. by GCC when it is given -O2) at most of its call sites.
It is defined in a header file, because it then could be inlined at most call sites (perhaps all of them). If it was just declared (and simply "exported") the inlining is unlikely to happen (except if you compile and link with link-time optimizations, a.k.a. LTO, also, e.g. compile and link with gcc -flto -O2, and that increases a lot the build time).
In practice, the compiler needs to know the body of a function to be able to inline it. So a suitable place is to define it in some common header file (otherwise, it could be inlined only in the same translation unit defining it, unless you enable LTO), so that every translation unit would know the body of that inlinable function.
It is declared static to avoid multiple definitions (at link time) in case the compiler did not inline it (e.g. when you use its address).
In practice, in C99 or C11 code (except with LTO, which I rarely use), I would always put the short functions I want to be inlined as static inline definitions in common header files.
Be sure to understand how and when the C preprocessor works. Notice that you could in principle (but it would be very bad practice and disgusting style) avoid defining some static inline function in a common header file and instead copy and paste its identical definition in multiple .c files.
(However, that might make sense for generated .c files, e.g. if you design a compiler emitting C code).
FYI LTO is practically implemented by recent GCC compilers by embedding some internal compiler representation (some GIMPLE) inside object files, and redoing some "compilation" step - using the lto1 frontend - at "link" time. In practice, the entire program is almost compiled "twice".
(actually, I always wondered why the C standardization committee did not decide instead that all explicitly inline functions are static)
In my project we are heavily using a C header which provides an API to comunicate to an external software. Long story short, in our project's bugs show up more often on the calling of the functions defined in those headers (it is an old and ugly legacy code).
I would like to implement an indirection on the calling of those functions, so I could include some profiling before calling the actual implementation.
Because I'm not the only person working on this project, I would like to make those wrappers in a such way that if someone uses the original implementations directly it should cause a compile error.
If those headers were C++ sources, I would be able to simply make a namespace, wrap the included files in it, and implement my functions using it (the other developers would be able to use the original implementation using the :: operator, but just not being able to call it directly is enough encapsulation to me). However the headers are C sources (which I have to include with extern "C" directive to include), so namespaces won't help me AFAIK.
I tried to play around with defines, but with no luck, like this:
#define my_func api_func
#define api_func NULL
What I wanted with the above code is to make my_func to be translated to api_func during the preprocessing, while making a direct call to api_func give a compile error, but that won't work because it will actually make my_func to be translated to NULL too.
So, basically, I would like to make a wrapper, and make sure the only way to access the API is through this wrapper (unless the other developers make some workaround, but this is inevitable).
Please note that I need to wrap hundreds of functions, which show up spread in the whole code several times.
My wrapper necessarily will have to include those C headers, but I would like to make them leave scope outside the file of my wrapper, and make them to be unavailable to every other file who includes my wrapper, but I guess this is not possible in C/C++.
You have several options, none of them wonderful.
if you have the sources of the legacy software, so that you can recompile it, you can just change the names of the API functions to make room for the wrapper functions. If you additionally make the original functions static and put the wrappers in the same source files, then you can ensure that the originals are called only via the wrappers. Example:
static int api_func_real(int arg);
int api_func(int arg) {
// ... instrumentation ...
int result = api_func_real(arg);
// ... instrumentation ...
return result;
}
static int api_func_real(int arg) {
// ...
}
The preprocessor can help you with that, but I hesitate to recommend specifics without any details to work with.
if you do not have sources for the legacy software, or if otherwise you are unwilling to modify it, then you need to make all the callers call your wrappers instead of the original functions. In this case you can modify the headers or include an additional header before that uses #define to change each of the original function names. That header must not be included in the source files containing the API function implementations, nor in those providing the wrapper function implementations. Each define would be of the form:
#define api_func api_func_wrapper
You would then implement the various api_func_wrapper() functions.
Among the ways those cases differ is that if you change the legacy function names, then internal calls among those functions will go through the wrappers bearing the original names (unless you change the calls, too), but if you implement wrappers with new names then they will be used only when called explicitly, which will not happen for internal calls within the legacy code (unless, again, you modify those calls).
You can do something like
[your wrapper's include file]
int origFunc1 (int x);
int origFunc2 (int x, int y);
#ifndef WRAPPER_IMPL
#define origFunc1 wrappedFunc1
#define origFunc2 wrappedFunc2
#else
int wrappedFunc1(int x);
int wrappedFunc2(int x, int y);
#endif
[your wrapper implementation]
#define WRAPPER_IMPL
#include "wrapper.h"
int wrapperFunc1 (...) {
printf("Wrapper1 called\n");
origFunc1(...);
}
Your wrapper's C file obviously needs to #define WRAPPER_IMPL before including the header.
That is neither nice nor clean (and if someone wants to cheat, he could simply define WRAPPER_IMPL), but at least some way to go.
There are two ways to wrap or override C functions in Linux:
Using LD_PRELOAD:
There is a shell environment variable in Linux called LD_PRELOAD,
which can be set to a path of a shared library,
and that library will be loaded before any other library (including glibc).
Using ‘ld --wrap=symbol‘:
This can be used to use a wrapper function for symbol.
Any further reference to symbol will be resolved to the wrapper function.
a complete writeup can be found at:
http://samanbarghi.com/blog/2014/09/05/how-to-wrap-a-system-call-libc-function-in-linux/
I am writing a C (shared) library. It started out as a single translation unit, in which I could define a couple of static global variables, to be hidden from external modules.
Now that the library has grown, I want to break the module into a couple of smaller source files. The problem is that now I have two options for the mentioned globals:
Have private copies at each source file and somehow sync their values via function calls - this will get very ugly very fast.
Remove the static definition, so the variables are shared across all translation units using extern - but now application code that is linked against the library can access these globals, if the required declaration is made there.
So, is there a neat way for making private global variable shared across multiple, specific translation units?
You want the visibility attribute extension of GCC.
Practically, something like:
#define MODULE_VISIBILITY __attribute__ ((visibility ("hidden")))
#define PUBLIC_VISIBILITY __attribute__ ((visibility ("default")))
(You probably want to #ifdef the above macros, using some configuration tricks à la autoconfand other autotools; on other systems you would just have empty definitions like #define PUBLIC_VISIBILITY /*empty*/ etc...)
Then, declare a variable:
int module_var MODULE_VISIBILITY;
or a function
void module_function (int) MODULE_VISIBILITY;
Then you can use module_var or call module_function inside your shared library, but not outside.
See also the -fvisibility code generation option of GCC.
BTW, you could also compile your whole library with -Dsomeglobal=alongname3419a6 and use someglobal as usual; to really find it your user would need to pass the same preprocessor definition to the compiler, and you can make the name alongname3419a6 random and improbable enough to make the collision improbable.
PS. This visibility is specific to GCC (and probably to ELF shared libraries such as those on Linux). It won't work without GCC or outside of shared libraries.... so is quite Linux specific (even if some few other systems, perhaps Solaris with GCC, have it). Probably some other compilers (clang from LLVM) might support also that on Linux for shared libraries (not static ones). Actually, the real hiding (to the several compilation units of a single shared library) is done mostly by the linker (because the ELF shared libraries permit that).
The easiest ("old-school") solution is to simply not declare the variable in the intended public header.
Split your libraries header into "header.h" and "header-internal.h", and declare internal stuff in the latter one.
Of course, you should also take care to protect your library-global variable's name so that it doesn't collide with user code; presumably you already have a prefix that you use for the functions for this purpose.
You can also wrap the variable(s) in a struct, to make it cleaner since then only one actual symbol is globally visible.
You can obfuscate things with disguised structs, if you really want to hide the information as best as possible. e.g. in a header file,
struct data_s {
void *v;
};
And somewhere in your source:
struct data_s data;
struct gbs {
// declare all your globals here
} gbss;
and then:
data.v = &gbss;
You can then access all the globals via: ((struct gbs *)data.v)->
I know that this will not be what you literally intended, but you can leave the global variables static and divide them into multiple source files.
Copy the functions that write to the corresponding static variable in the same source file also declared static.
Declare functions that read the static variable so that external source files of the same module can read it's value.
In a way making it less global. If possible, best logic for breaking big files into smaller ones, is to make that decision based on the data.
If it is not possible to do it this way than you can bump all the global variables into one source file as static and access them from the other source files of the module by functions, making it official so if someone is manipulating your global variables at least you know how.
But then it probably is better to use #unwind's method.
I recently got a snippet of code in Linux kernel:
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
__acquires(&info->lock)
__releases(&info->lock)
{
...
}
What confused me is the two __functions following static int fb_mmap() right before "{",
a).What is the purpose of the two __funtions?
b).Why in that position?
c).Why do they have the prefix "__"?
d).Are there other examples similar to this?
Not everything ending with a pair of parenthesis is a function (call). In this case they are parameterized macro expansions. The macros are defined as
#define __acquires(x) __attribute__((context(x,0,1)))
#define __releases(x) __attribute__((context(x,1,0)))
in file include/linux/compiler.h in the kernel build tree.
The purpose of those macros expanding into attribute definitions is to annotate the function symbols with information about which locking structures the function will acquire (i.e. lock) and release (i.e. unlock). The purpose of those in particular is debugging locking mechanisms (the Linux kernel contains some code that allows it to detect potential deadlock situations and report on this).
https://en.wikipedia.org/wiki/Sparse
__attribute__ is a keyword specific to the GCC compiler, that allows to assign, well, attributes to a given symbol
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes
Since macros are expanded at the text level, before the compiler is even looking at it, the result for your particular snippet, that the actual compilers sees would be
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
__attribute__((context(&info->lock,0,1)))
__attribute__((context(&info->lock,1,0)))
{
…
}
Those macros start with a double underscore __ to indicate, that they are part of the compiler environment. All identifiers starting with one or two underscores are reserved for the compiler environment implementation. In the case of the Linux kernel, because Linux is a operating system kernel that does not (because it simply is not availible) use the standard library, it's natural for it, do define it's own compiler environment definitions, private to it. Hence the two underscores to indicate, that this is compiler environment/implementation specific stuff.
They're probably macros defined with #define. You should look for the definition of such macros and see what they expand to. They might expand to some pragma giving hints to the compiler; they might expand to nothing giving hints to the developers or some analysis tool. The meaning might vary
The __attribute__ these macros evaluate to are compiler-specific features. man gcc explains some of the uses.
The prefix __ typically is used to avoid name clashes; double underscore as prefix and postfix mark an identifier as being used by the compiler itself.
More on gcc attributes can be found here.
More on the kernel use of these can be found here.
Those are macro's defined as
# define __acquires(x) __attribute__((context(x,0,1)))
# define __releases(x) __attribute__((context(x,1,0)))
in Linux/include/linux/compiler.h
I have an interface with which I want to be able to statically link modules. For example, I want to be able to call all functions (albeit in seperate files) called FOO or that match a certain prototype, ultimately make a call into a function in the file without a header in the other files. Dont say that it is impossible since I found a hack that can do it, but I want a non hacked method. (The hack is to use nm to get functions and their prototypes then I can dynamically call the function). Also, I know you can do this with dynamic linking, however, I want to statically link the files. Any ideas?
Put a table of all functions into each translation unit:
struct functions MOD1FUNCS[]={
{"FOO", foo},
{"BAR", bar},
{0, 0}
};
Then put a table into the main program listing all these tables:
struct functions* ALLFUNCS[]={
MOD1FUNCS,
MOD2FUNCS,
0
};
Then, at run time, search through the tables, and lookup the corresponding function pointer.
This is somewhat common in writing test code. e.g., you want to call all functions that start with test_. So you have a shell script that grep's through all your .C files and pulls out the function names that match test_.*. Then that script generates a test.c file that contains a function that calls all the test functions.
e.g., generated program would look like:
int main() {
initTestCode();
testA();
testB();
testC();
}
Another way to do it would be to use some linker tricks. This is what the Linux kernel does for its initialization. Functions that are init code are marked with the qualifier __init. This is defined in linux/init.h as follows:
#define __init __section(.init.text) __cold notrace
This causes the linker to put that function in the section .init.text. The kernel will reclaim memory from that section after the system boots.
For calling the functions, each module will declare an initcall function with some other macros core_initcall(func), arch_initcall(func), et cetera (also defined in linux/init.h). These macros put a pointer to the function into a linker section called .initcall.
At boot-time, the kernel will "walk" through the .initcall section calling all of the pointers there. The code that walks through looks like this:
extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[];
static void __init do_initcalls(void)
{
initcall_t *fn;
for (fn = __early_initcall_end; fn < __initcall_end; fn++)
do_one_initcall(*fn);
/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();
}
The symbols __initcall_start, __initcall_end, etc. get defined in the linker script.
In general, the Linux kernel does some of the cleverest tricks with the GCC pre-processor, compiler and linker that are possible. It's always been a great reference for C tricks.
You really need static linking and, at the same time, to select all matching functions at runtime, right? Because the latter is a typical case for dynamic linking, i'd say.
You obviusly need some mechanism to register the available functions. Dynamic linking would provide just this.
I really don't think you can do it. C isn't exactly capable of late-binding or the sort of introspection you seem to be requiring.
Although I don't really understand your question. Do you want the features of dynamically linked libraries while statically linking? Because that doesn't make sense to me... to static link, you need to already have the binary in hand, which would make dynamic loading of functions a waste of time, even if you could easily do it.