changing extern function pointer to extern pointer using preprocessor - c

I am using library that I shouldn't change it files, that including my h file.
the code of the library looks somthing like like:
#include "my_file"
extern void (*some_func)();
void foo()
{
(some_func)();
}
my problem is that I want that some_func will be extern function and not extern pointer to function (I am implementing and linking some_func). and that how main will call it.
that way I will save little run time and code space, and no one in mistake will change this global.
is it possible?
I thought about adding in my_file.h somthing as
#define *some_func some_func
but it won't compile because asterisk is not allowed in #define.
EDIT
The file is not compiled already, so changes at my_file.h will effect the compilation.

First of all, you say that you can't change the source of the library. Well, this is bad, and some "betrayal" is necessary.
My approach is to let the declaration of the pointer some_func as is, a non-constant writable variable, but to implement it as constant non-writable variable, which will be initialized once for all with the wanted address.
Here comes the minimal, reproducible example.
The library is implemented as you show us:
// lib.c
#include "my_file"
extern void (*some_func)();
void foo()
{
(some_func)();
}
Since you have this include file in the library's source, I provide one. But it is empty.
// my_file
I use a header file that declares the public API of the library. This file still has the writable declaration of the pointer, so that offenders believe they can change it.
// lib.h
extern void (*some_func)();
void foo();
I separated an offending module to try the impossible. It has a header file and an implementation file. In the source the erroneous assignment is marked, already revealing what will happen.
// offender.h
void offend(void);
// offender.c
#include <stdio.h>
#include "lib.h"
#include "offender.h"
static void other_func()
{
puts("other_func");
}
void offend(void)
{
some_func = other_func; // the assignment gives a run-time error
}
The test program consists of this little source. To avoid compiler errors, the declaration has to be attributed as const. Here, where we are including the declarating header file, we can use some preprocessor magic.
// main.c
#include <stdio.h>
#define some_func const some_func
#include "lib.h"
#undef some_func
#include "offender.h"
static void my_func()
{
puts("my_func");
}
void (* const some_func)() = my_func;
int main(void)
{
foo();
offend();
foo();
return 0;
}
The trick is, that the compiler places the pointer variable in the read-only section of the executable. The const attribute is just used by the compiler and is not stored in the intermediate object files, and the linker happily resolves all references. Any write access to the variable will generate a runtime error.
Now all of this is compiled in an executable, I used GCC on Windows. I did not bother to create a separated library, because it doesn't make a difference for the effect.
gcc -Wall -Wextra -g main.c offender.c lib.c -o test.exe
If I run the executable in "cmd", it just prints "my_func". Apparently the second call of foo() is never executed. The ERRORLEVEL is -1073741819, which is 0xC0000005. Looking up this code gives the meaning "STATUS_ACCESS_VIOLATION", on other systems known as "segmentation fault".
Because I deliberately compiled with the debugging flag -g, I can use the debugger to examine more deeply.
d:\tmp\StackOverflow\103> gdb -q test.exe
Reading symbols from test.exe...done.
(gdb) r
Starting program: d:\tmp\StackOverflow\103\test.exe
[New Thread 12696.0x1f00]
[New Thread 12696.0x15d8]
my_func
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004015c9 in offend () at offender.c:16
16 some_func = other_func;
Alright, as I intended, the assignment is blocked. However, the reaction of the system is quite harsh.
Unfortunately we cannot get a compile-time or link-time error. This is because of the design of the library, which is fixed, as you say.

You could look at the ifunc attribute if you are using GCC or related. It should patch a small trampoline at load time. So when calling the function, the trampoline is called with a known static address and then inside the trampoline there is a jump instruction that was patched with the real address. So when running, all jump locations are directly in the code, which should be efficient with the instruction cache. Note that it might even be more efficient than this, but at most as bad as calling the function pointer. Here is how you would implement it:
extern void (*some_func)(void); // defined in the header you do not have control about
void some_func_resolved(void) __attribute__((ifunc("resolve_some_func")));
static void (*resolve_some_func(void)) (void)
{
return some_func;
}
// call some_func_resolved instead now

Related

In gcc is there any way to dynamically add a function call to the start of main()?

I'm dynamically overriding malloc() with a fast_malloc() implementation of mine in a glibc benchmark malloc speed test (glibc/benchtests/bench-malloc-thread.c), by writing these functions in my fast_malloc.c file:
// Override malloc() and free(); see: https://stackoverflow.com/a/262481/4561887
inline void* malloc(size_t num_bytes)
{
static bool first_call = true;
if (first_call)
{
first_call = false;
fast_malloc_error_t error = fast_malloc_init();
assert(error == FAST_MALLOC_ERROR_OK);
}
return fast_malloc(num_bytes);
}
inline void free(void* ptr)
{
fast_free(ptr);
}
Notice that I have this inefficient addition to my malloc() wrapper to ensure fast_malloc_init() gets called first on just the first call, to initialize some memory pools. I'd like to get rid of that and dynamically insert that init call into the start of main(), without modifying the glibc code, if possible. Is this possible?
The downside of how I've written my malloc() wrapper so far is it skews my benchtest results making it look like my fast_malloc() is slower than it really is, because the init func gets timed by glibc/benchtests/bench-malloc-thread.c, and I have this extraneous if (first_call) which gets checked every malloc call.
Currently I dynamically override malloc() and free(), while calling the bench-malloc-thread executable, like this:
LD_PRELOAD='/home/gabriel/GS/dev/fast_malloc/build/libfast_malloc.so' \
glibc-build/benchtests/bench-malloc-thread 1
Plot I will be adding my fast_malloc() speed tests to (using this repo):
LinkedIn post I made about this: https://www.linkedin.com/posts/gabriel-staples_software-engineering-tradeoffs-activity-6815412255325339648-_c8L.
Related:
[my repo fork] https://github.com/ElectricRCAircraftGuy/malloc-benchmarks
[how I learned how to generate *.so dynamic libraries in gcc] https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html
Create a wrapper function for malloc and free in C
Is this possible?
Yes. You are building and LD_PRELOADing a shared library, and shared libraries can have special initializer and finalizer functions, which are called by the dynamic loader when the library is loaded and unloaded respectively.
As kaylum commented, to create such a constructor, you would use __attribute__((constructor)), like so:
__attribute__((constructor))
void fast_malloc_init_ctor()
{
fast_malloc_error_t error = fast_malloc_init();
assert(error == FAST_MALLOC_ERROR_OK);
}
// ... the rest of implementation here.
P.S.
it skews my benchtest results making it look like my fast_malloc() is slower than it really is, because the init func gets timed
You are comparing with multi-threaded benchmarks. Note that your static bool fist_call is not thread-safe. In practice this will not matter, because malloc is normally called long before any threads (other than the main thread) exist.
I doubt that this single comparison actually makes your fast_malloc() slower. It probably is slower even after you remove the comparison -- writing a fast heap allocator takes a lot of effort, and smart people have spent many man-years optimizing GLIBC malloc, TCMalloc and jemalloc.
How to dynamically inject function calls before and after another executable's main() function.
Here is a full, runnable example for anyone wanting to test this on their own. Tested on Linux Ubuntu 20.04.
This code is all part of my eRCaGuy_hello_world repo.
hello_world_basic.c:
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
// int main(int argc, char *argv[]) // alternative prototype
int main()
{
printf("This is the start of `main()`.\n");
printf(" Hello world.\n");
printf("This is the end of `main()`.\n");
return 0;
}
dynamic_func_call_before_and_after_main.c:
#include <assert.h>
#include <stdbool.h> // For `true` (`1`) and `false` (`0`) macros in C
#include <stdint.h> // For `uint8_t`, `int8_t`, etc.
#include <stdio.h> // For `printf()`
#include <stdlib.h> // For `atexit()`
/// 3. This function gets attached as a post-main() callback (a sort of program "destructor")
/// via the C <stdlib.h> `atexit()` call below
void also_called_after_main()
{
printf("`atexit()`-registered callback functions are also called AFTER `main()`.\n");
}
/// 1. Functions with gcc function attribute, `constructor`, get automatically called **before**
/// `main()`; see:
/// https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
__attribute__((__constructor__))
void called_before_main()
{
printf("gcc constructors are called BEFORE `main()`.\n");
// 3. Optional way to register a function call for AFTER main(), although
// I prefer the simpler gcc `destructor` attribute technique below, instead.
int retcode = atexit(also_called_after_main);
assert(retcode == 0); // ensure the `atexit()` call to register the callback function succeeds
}
/// 2. Functions with gcc function attribute, `destructor`, get automatically called **after**
/// `main()`; see:
/// https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
__attribute__((__destructor__))
void called_after_main()
{
printf("gcc destructors are called AFTER `main()`.\n");
}
How to build and run the dynamic lib*.so shared-object library and dynamically load it with LD_PRELOAD as you run another program (see "dynamic_func_call_before_and_after_main__build_and_run.sh from my eRCaGuy_hello_world repo"):
# 1. Build the other program (hello_world_basic.c) that has `main()` in it which we want to use
mkdir -p bin && gcc -Wall -Wextra -Werror -O3 -std=c11 -save-temps=obj hello_world_basic.c \
-o bin/hello_world_basic
# 2. Create a .o object file of this program, compiling with Position Independent Code (PIC); see
# here: https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html
gcc -Wall -Wextra -Werror -O3 -std=c11 -fpic -c dynamic_func_call_before_and_after_main.c \
-o bin/dynamic_func_call_before_and_after_main.o
# 3. Link the above PIC object file into a dynamic shared library (`lib*.so` file); link above shows
# we must use `-shared`
gcc -shared bin/dynamic_func_call_before_and_after_main.o -o \
bin/libdynamic_func_call_before_and_after_main.so
# 4. Call the other program with `main()` in it, dynamically injecting this code into that other
# program via this code's .so shared object file, and via Linux's `LD_PRELOAD` trick
LD_PRELOAD='bin/libdynamic_func_call_before_and_after_main.so' bin/hello_world_basic
Sample output. Notice that we have injected some special function calls both before AND after the main() function found in "hello_world_basic.c":
gcc constructors are called BEFORE `main()`.
This is the start of `main()`.
Hello world.
This is the end of `main()`.
gcc destructors are called AFTER `main()`.
`atexit()`-registered callback functions are also called AFTER `main()`.
References:
How to build dynamic lib*.so libraries in Linux: https://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html
#kaylum's comment
#Employed Russian's answer
#Lundin's comment
gcc constructor and destructor function attributes!:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
c atexit() func to register functions to be called AFTER main() returns or exits!:
https://en.cppreference.com/w/c/program/atexit

Cannot return correct memory address from a shared lib in C

I have been trying to implement a small simulation to understand memory allocation of malloc(). I created a shared library called mem.c. I am linking the library to the main but cannot pass the correct address of the simulated "heap". Heap is created by a malloc() call in the shared library.
Address in the shared library: 0x55ddaff662a0
Address in the main: 0xffffffffaff662a0
Only last 4 bytes seem to be correct. Rest is set to 0xf.
However, when I #include "mem.c" in the main it works correctly. How can I achieve the same result without including the mem.c. I am trying to solve this without including mem.c or mem.h. I create shared library as this:
gcc -c -fpic mem.c
gcc -shared -o libmem.so mem.o
gcc main.c -lmem -L. -o main
From your comments
I am trying to implement without using #include mem.h or mem.c.
Then you must provide by other means a prototype for the function you're calling. Without an explicit function prototype, following the tradition of K&R and then later ANSI C, undeclared functions are assumed to return an int and take parameters of type int.
EDIT: Essentially you need to write what'd you normally find in a header, somewhere before you make first use of the function. Or of it's a function pointer you need an appropriate variable to store the function pointer.
For example to declare a function that returns an untyped pointer, and an arbitrary, unspecified number of arguments you'd write
void *getAddr();
Note that using the extern keyword here is not required, since extern linkage is always implied for non-static function declarations.
In case you want to dynamically link at runtime (using dlopen / LoadLibrary → dlsym / GetProcAddress), you'd define a function pointer variable
void* (*getAddr_fptr)();
You can set it using dlsym with
*(void**)(&getAddr_fptr) = dlsym(…)
This awkward way of writing it comes due to function pointers being allowed to have a different size and alignment as data pointers (see the dlsym manpage for details).
These days on the majority of platforms int is a 4 byte type and the most common calling convention pass the first few function arguments by register. On x86 (and x86_64) the registers are AX, BX, CX and DX and may be accessed in different sizes, but may read and write with different size (to allow size conversion). This explains why only the first 4 bytes are passed: It's passed via register and only the write to the register is done as a 4 byte wide write. When the function then reads from the register, it does so with a wider type, with the higher value bits set to all 1.
From the comments:
Do you have a declaration for getAddr in your main code?
No I don't have but I am trying to implement without a declaration, is it possible?
Then that's your problem. Without a declaration, the compiler falls back to a default declaration of int getAddr(). This is incompatible with the actual definition which returns a void *, and calling a function through an incompatible declaration triggers undefined behavior.
What probably happened is that when the return value of the function was actually returned you only got back the 4 low-order bytes. Assuming your system is little-endian, and int is 4 bytes, and a void * is 8 bytes, this would explain the low bits being the same.
You must include a valid declaration before the function is called. It doesn't necessarily have to reside in a header file, but it has to be visible at the point the call happens.
I'm assuming you're trying to accomplish something like this? For mem.c
#include <stdlib.h>
#include <stdio.h>
void* getAddr() {
char *heap = (char *)malloc(10);
printf("%p\n", (void*)heap);
return heap;
}
And then without including any headers for the mem.c functions, you'd probably create a library out of mem.c as you've already mentioned in the question and have something as follows in main.c
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
typedef void* (*getAddr)(); //prototype for getAddr() in mem.c
int main() {
void* handle = dlopen("./libmem.so", RTLD_LAZY);
if(handle) {
void* fn = dlsym(handle, "getAddr");
if(fn) {
void* addr = ((getAddr)(fn))();
printf("%p\n", addr);
free(addr);
addr = NULL;
} else {
printf("Failed to dlsym %s\n", dlerror());
}
} else {
printf("Failed to dlopen %s\n", dlerror());
}
}
EDIT: For OP's purpose as #Zilog80 mentioned, since the library is being linked with main executable, the dlopen() part can be gotten rid of and main.c can be simplified as
#include <stdio.h>
#include <stdlib.h>
extern void* getAddr(); //prototype for getAddr() in mem.c
int main() {
void* addr = getAddr();
printf("%p\n", addr);
free(addr);
addr = NULL;
}
And used similar compilation commands as OP i.e.
gcc -shared -o libmem.so -fpic mem.c
gcc main.c -lmem -L . -o main
while executing
LD_LIBRARY_PATH=. ./main

Redefining main to another name

In C90, can I redefine main and give it another name, and possibly add extra parameters using #define?
Have this in a header file for example:
#include <stdio.h>
#include <stdlib.h>
#define main( void ) new_main( void )
int new_main( void );
The header doesn't show any errors when compiling.
When I try compiling it with the main C file, however, I keep getting an error
In function '_start': Undefined reference to 'main'
No, you cannot do that, because it would be against language and OS standards. The name main and its arguments argc, argv and environ constitute a part of system loader calling conventions.
A bit simplifying explanation (no ABI level, just API level) ensues. When your program has been loaded into memory and is about to start, the loader needs to know which function to call as an entrypoint, and how to pass its environment to it. If it was be possible to change the name of main and/or its parameter list, it would have been needed to communicate details of new calling interface back to the loader. And there is no convenient way to do it (apart from writing your own executable loader).
In function '_start': Undefined reference to 'main'
Here you can see an implementation detail of Linux/POISX ELF loader interface. The compiler adds function _start to your program behind the scenes, which is an actual program entrypoint. _start is tasked to do extra initialization steps common to most programs that use LibC. It is _start that later calls your main. Theoretically, you could write a program that has its own function called _start and no main and it would be fine. It is not trivial as you will have to make sure that the default _start code is no longer being attached to your program (no double definitions), but it is doable. And no, you cannot choose other name than _start for the same reasons.
The presence of #define main new_main within a compilation unit will not affect the name of the function the implementation will call on program startup. The implementation is going to call a function called main regardless of any macros you define.
If you are going to use a #define like that to prevent the primary declaration of main() from producing a function by that name, you'll need to include a definition of main() somewhere else; that alternate version could then invoke the original. For example, if the original definition didn't use its arguments, and if the program exits only by returning from main() [as opposed to using exit()] you might put #define main new_main within a header file used by the primary definition of main, and then in another file do something like:
#include <stdio.h>
#include <conio.h> // For getch() function.
int main(void)
{
int result = main();
printf("\nExit code was %d. Strike any key.\n", result);
getch();
return result;
}
In most cases, it would be better to add any such code within the ordinary "main" function, but this approach can be useful in cases where the file containing main is produced by code generation tools on every build, or for some other reason cannot be modified to include such code.
No you cannot (as Grigory said).
You can however, immediate call your proxy main,
int
your_new_main(int argc, char* argv[], char* envp[]) {
... //your stuff goes here
}
//just place this in an include file, and only include in main...
int
main( int argc, char* argv[], char* envp[])
{
int result = your_new_main(argc, argv);
return result;
}
As far as whether envp is supported everywhere?
Is char *envp[] as a third argument to main() portable
Assuming you're using gcc passing -nostdlib to your program, and then set a new entry, by passing this to gcc which passing it to the linker, -Wl,-enew_main. Doing this won't give you access to any of the nice features that the C runtime does before calling your main, and you'd have to do it yourself.
You can look at resources about what happens before main is called.
What Happens Before main

Using a file-static function as a callback from a different translation unit

Consider the following code:
a.c
#include <stdio.h>
#include "b.h"
static int a = 41;
static void test(void){
a++;
printf("a: %d\n", a);
}
int main(void){
set_callback(test);
call();
return 0;
}
b.c
static void (*callback)(void);
void set_callback(void (*func)(void)){
callback = func;
}
void call(void){
if (callback){
callback();
}
}
b.h
void set_callback(void (*func)(void));
void call(void);
This compiles without warnings with -Wall and prints out a: 42 as expected.
Now, this might not be the best practice, since the writer of a.c probably doesn't expect test() to be called from another file and the variable a modified, but is this legal C code? Will it work portably on different platforms and compilers?
Yes, this is perfectly fine code and even good code. There is no need for your test callback to be global.
The compiler is responsible to ensure that the function isn't called outside the translation unit, before doing any optimizations that would affect calling it from outside.
If it sees that a pointer to the function is passed to an external function, it has to refrain from doing incompatible optimizations to the function.
Thus, the only effect is that the object file won't export a test symbol (This is termed Internal Linkage).
You wrote,
Now, this might not be the best practice, since the writer of a.c probably doesn't expect test() to be called from another file and the variable a modified, but is this legal C code?
If the writer did not want his test() function to be called from another file, he should not have passed a pointer to it to an outside module!
When the writer called set_callback(test); he knew he was passing his static method to an outside module, and giving that outside module permission to call it.
The point is that the author is in charge of the test method and where it goes. He's not prevented from doing anything, but he can control where his data goes; and he chose to pass it to an outsider.

Detect -nostdlib or just detect whether stdlib is available or not

I have a homework assignment that requires us to open, read and write to file using system calls rather than standard libraries. To debug it, I want to use std libraries when test-compiling the project. I did this:
#ifdef HOME
//Home debug prinf function
#include <stdio.h>
#else
//Dummy prinf function
int printf(const char* ff, ...) {
return 0;
}
#endif
And I compile it like this: gcc -DHOME -m32 -static -O2 -o main.exe main.c
Problem is that I with -nostdlib argument, the standard entry point is void _start but without the argument, the entry point is int main(const char** args). You'd probably do this:
//Normal entry point
int main(const char** args) {
_start();
}
//-nostdlib entry point
void _start() {
//actual code
}
In that case, this is what you get when you compile without -nostdlib:
/tmp/ccZmQ4cB.o: In function `_start':
main.c:(.text+0x20): multiple definition of `_start'
/usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crt1.o:(.text+0x0): first defined here
Therefore I need to detect whether stdlib is included and do not define _start in that case.
The low-level entry point is always _start for your system. With -nostdlib, its definition is omitted from linking so you have to provide one. Without -nostdlib, you must not attempt to define it; even if this didn't get a link error from duplicate definition, it would horribly break the startup of the standard library runtime.
Instead, try doing it the other way around:
int main() {
/* your code here */
}
#ifdef NOSTDLIB_BUILD /* you need to define this with -D */
void _start() {
main();
}
#endif
You could optionally add fake arguments to main. It's impossible to get the real ones from a _start written in C though. You'd need to write _start in asm for that.
Note that -nostdlib is a linker option, not compile-time, so there's no way to automatically determine at compile-time that that -nostdlib is going to be used. Instead just make your own macro and pass it on the command line as -DNOSTDLIB_BUILD or similar.

Resources