I am building a debugging memory tool in a form of a shared library which I link against an executable at run time(includes overrided methods of malloc family). To handle initializations of my data structures I simple use a condition variable. Every time my malloc is called I check if the variable is not set and then I call a function responsible for initializing my structures. Now this works fine for programs running a single thread of execution but problems arise if a program includes more than 1 thread.
The only way (I can think of) to be sure that initialization happens before the user spawns any threads is to override _init as shown in this link.
Now this small example runs right but when I try to override _init on my own shared libary I get this error when trying to link it :
memory2.o: In function `_init':
memory2.c(.text+0x0): multiple definition of `_init'
/usr/lib/gcc/i686-linux-gnu/4.4.5/../../../../lib/crti.o(.init+0x0):
first defined here
collect2: ld returned 1 exit status
I use exactly the same steps as in the example from the link, it's just that my shared library also includes a set of global variables and overrided versions of malloc/free etc.
Anyone can give me a pointer of what's going wrong? Furthermore , is there anything else to take into consideration when overriding _init ( I am guessing it's not a very normal thing to do).
Thank you
Take a look at the following FAQ page:
http://www.faqs.org/docs/Linux-HOWTO/Program-Library-HOWTO.html#INIT-AND-CLEANUP
It describes _init/_fini as dangerous and obsolete, and recommends that __attribute__ ((constructor)) and __attribute__ ((destructor)) are used instead.
From the gcc manual:
constructor (priority)
destructor (priority)
The
constructor attribute causes the
function to be called automatically
before execution enters main().
Similarly, the destructor attribute
causes the function to be called
automatically after main() has
completed or exit() has been called.
Functions with these attributes are
useful for initializing data that will
be used implicitly during the
execution of the program. You may
provide an optional integer priority
to control the order in which
constructor and destructor functions
are run. A constructor with a smaller
priority number runs before a
constructor with a larger priority
number; the opposite relationship
holds for destructors. So, if you have
a constructor that allocates a
resource and a destructor that
deallocates the same resource, both
functions typically have the same
priority. The priorities for
constructor and destructor functions
are the same as those specified for
namespace-scope C++ objects (see C++
Attributes).
These attributes are not currently
implemented for Objective-C.
1) You can write your own _init or main:
GNU GCC allows you to define your own function of the same name as an existing symbol. When linking, you provide an argument of -Xlinker --wrap=<symName>. Pretending you did this to main, you can call the real main via __real_main(...):
int main(int argc, void *argv)
{
// any code you want here
return __real_main(argc,argv);
}
2) You can write your own dynamic linker. If you do this then set the .interp section to point to the shared object containing your dynamic linker/loader.
To overcome that error compile the code as gcc -nostartfiles memory2.c -o memory2, here we are skipping the constructor and destructor.
But it is not recommended to override these.
Related
I have (mapped in memory) two object files, "A.o" and "B.o", with the same CPU Instruction Set (not necessarily Intel --it can be x86, x86_64, MIPS(32/64), ARM(32/64), PowerPC(32/64),..., but always the same in both object files).
Also, both object files are compiled with the same endianness (both little endian, or both big endian).
However (you knew there was a however, otherwise there wouldn't be any question), "A.o" and "B.o" can have a different function calling convention and, to make things worse, unknown to each other ("A.o" has not even the slightest idea about the calling convention for functions in "B.o", and vice versa).
"A.o" and "B.o" are obviously designed to call functions within their same object file, but there must be a (very) limited interface for communicating between them (otherwise, if execution starts at some function in "A.o", no function from "B.o" would ever be executed if there was no such interface).
The file where execution started (let's suppose it's "A.o") knows the addresses of all static symbols from "B.o" (the addresses of all functions and all global variables). But the opposite is not true (well, the limited interface I'm trying to write would overcome that, but "B.o" doesn't know any address from "A.o" before such interface is established).
Finally the question: How can execution jump from a function in "A.o" to a function in "B.o", and back, while also communicating some data?
I need it to:
Be done in standard C (no assembly).
Be portable C (not compiler-dependent, nor CPU-dependent).
Be thread safe.
Don't make any assumption about the calling conventions involved.
Be able to communicate data between the two object files.
My best idea, for the moment, seems that can meet all these requirements, except thread safety. For example, if I define an struct like this:
struct data_interface {
int value_in;
int value_out; };
I could write a pointer to an struct like this from "A.o" into a global variable of "B.o" (knowing in advance that such global variable in "B.o" has space enough for storing a pointer).
Then, the interface function would be a void interface(void) (I'm assuming that calling void(void) functions is safe across different calling conventions... if this is not true, then my idea wouldn't work). Calling such a function from "A.o" to "B.o" would communicate the data to the code in "B.o". And, fingers crossed, when the called function in "B.o" returns, it would travel back nicely (supposing the different calling convention doesn't change the behaviour when returning from void(void) functions).
However, this is not thread safe, of course.
For it to be thread safe, I guess my only option is to access the stack.
But... can the stack be accessed in a portable way in standard C?
Here are two suggestions.
Data interface
This elaborates on the struct you defined yourself. From what I've seen in the past, compilers typically use a single register (e.g. eax) for their return value (provided the return type fits in a register). My guess is, the following function prototype is likely to be unaffected by differing calling conventions.
struct data_interface *get_empty_data_interface(void);
If so, then you could use that in a way that is similar to the idea you already had about using arrays. Define the following struct and functions in B:
struct data_interface {
int ready;
int the_real_data;
};
struct data_interface *get_empty_data_interface(void)
{
struct data_interface *ptr = malloc(sizeof(struct data_interface));
add_to_list_of_data_block_pointers(ptr);
ptr->ready = 0;
return ptr;
}
void the_function(void)
{
execute_functionality_for_every_data_block_in_my_list_that_is_flagged_ready_and_remove_from_list();
}
To call the function, do this in A:
struct data_interface *ptr = get_empty_data_interface();
ptr->the_real_data = 12345;
ptr->ready = 1;
the_function();
For thread-safety, make sure the list of data blocks maintained by B is thread-safe.
Simultaneous calls to get_empty_data_interface should not overwrite each other's slot in the list.
Simultaneous calls to the_function should not both pick up the same list element.
Wrapper functions
You could try to expose wrapper functions with a well-known calling convention (e.g. cdecl); if necessary defined in a separate object file that is aware of the calling convention of the functions it wraps.
Unfortunately you will probably need non-portable function attributes for this.
You may be able to cheat your way out of it by declaring variadic wrapper functions (with an ellipsis parameter, like printf has); compilers are likely to fall back on cdecl for those. This eliminates non-portable function attributes, but it may be unreliable; you would have to verify my assumption for every compiler you'd like to support. When testing this, keep in mind that compiler options (in particular optimizations) may well play a role. All in all, quite a dirty approach.
the question implies that both object files are compiled differently except for the endianness and that they are linked together into one executable.
it says that A.o knows all static symbols from B.o, but the opposite is not true.
Don't make any assumption about the calling conventions involved.
so we'll be using only void f(void) type of functions.
you'll declare int X, Y; in B.o and extern int X, Y; in A.o so before you call the functions in B.o you check the Y flag, if raised wait until it falls. when a B's function is called it raises the Y flag, read the input from X, do some calculations, write the result back in X and return.
then the calling function in A.o copies the value from X into it's own compilation unit and clears the Y flag.
...if calling a void f(void) function just makes a wild jump from one point in the code to another.
another way to do it would be to declare static int Y = 0; in B.o and omit it entirely in A.o
then when a B.o function gets called it checks if Y == 0 and if so increase Y, read X, do calculations, write X, decrease Y and return. if not so then wait to become 0 and block the calling function.
or maybe even have a static flag in every B.o function, but i don't see the point in this waste since the communication data is global in B.o
Remember that there are both caller saves and callee saves conventions out there, together with variations on use of registers to pass values, use or not of a frame pointer, and even (in some architectures, in some optimisation levels) the use of the delay slot in a branch to hold the first instruction of the subroutine. You are not going to be able to do this without some knowledge of the calling conventions in play, but fortunately the linker will need that anyway. Presumably there is some higher level entity that is responsible for loading those DLLs and that knows the calling conventions for both of them?
Anything you do here is going to be at best deep into implementation defined territory, if not technically undefined behaviour, and you will want to make a deep study of the linker and loader (In particular the linker must know how to resolve dynamic linkage in your unknown calling convention or you will not be able to load that shared object in a meaningful way, so you may be able to leaverage it using libbfd or such but that is outside the scope of C).
The place this sort of thing can go very wrong is if shared resources are allocated in A and freed in B (Memory springs to mind) as memory management is a usually a library based wrapper over the operating systems SBRK or similar, and these implementations of memory management are not inherently compatible in memory layout, other places you may be bitten by this include IO (see shennanigans you sometimes get when mixing printf and cout in c++ for a benign example), and locking.
Which functions are called prior to DllMain()? If more than one during the C runtime initialization, then the order is important.
From the source:-
If your DLL is linked with the C run-time library (CRT), the entry
point provided by the CRT calls the constructors and destructors for
global and static C++ objects. Therefore, these restrictions for
DllMain also apply to constructors and destructors and any code that
is called from them.
I think only _DllMainCRTStartup() is called, which in turns calls all constructors of global C++ objects (none in the case of C) and (I'm not sure of that last one) calls DllMain().
Of course, it also calls some Kernel32 functions to initialize the CRT (for starters, it needs to allocate some memory and a TLS slot).
This is very compiler dependent.
DllMain() has exactly the same calling convention as the DLL's entry point so for some compilers DllMain() is the entry point of the DLL!
Other compilers use their own entry point where some DLL initializations are done before entering DllMain().
In contrast to this the entry point of an EXE file does not have any arguments and the function must never return. Therefore the WinMain() or main() function cannot be the entry point of an EXE file but there must be some preparation code that is called before WinMain() or main().
I am enhancing a tool.
Please note that this tool will be linked to test program, which will have main( ) function, so my tool can't have main. What this tool has is a number of functions which the test program will use.
Now additionally, i want to add a timer to this tool. The idea is: when the test program is linked to this tool and starts, the timer should automatically start.
If this was C++, i would have created a class with a constructor, so that whenever the class is loaded, the constructor is called first, and i can initialize my timer inside the constructor.
If this was Java, i would have simply created a global static block, and put the timer code inside the static block.
But my tool is purely in C in Linux, so how can i achieve this goal?
Please help me.
This looks like your case also:
How do I get the GCC __attribute__ ((constructor)) to work under OSX?
From GCC docs:
constructor
destructor
constructor (priority)
destructor (priority)
The constructor attribute causes the
function to be called automatically
before execution enters main ().
Similarly, the destructor attribute
causes the function to be called
automatically after main () has
completed or exit () has been called.
Functions with these attributes are
useful for initializing data that will
be used implicitly during the
execution of the program.
Write your own replacement for the crt*.o object file that calls main(), and link to it when building.
I have an interface with which I want to be able to statically link modules. For example, I want to be able to call all functions (albeit in seperate files) called FOO or that match a certain prototype, ultimately make a call into a function in the file without a header in the other files. Dont say that it is impossible since I found a hack that can do it, but I want a non hacked method. (The hack is to use nm to get functions and their prototypes then I can dynamically call the function). Also, I know you can do this with dynamic linking, however, I want to statically link the files. Any ideas?
Put a table of all functions into each translation unit:
struct functions MOD1FUNCS[]={
{"FOO", foo},
{"BAR", bar},
{0, 0}
};
Then put a table into the main program listing all these tables:
struct functions* ALLFUNCS[]={
MOD1FUNCS,
MOD2FUNCS,
0
};
Then, at run time, search through the tables, and lookup the corresponding function pointer.
This is somewhat common in writing test code. e.g., you want to call all functions that start with test_. So you have a shell script that grep's through all your .C files and pulls out the function names that match test_.*. Then that script generates a test.c file that contains a function that calls all the test functions.
e.g., generated program would look like:
int main() {
initTestCode();
testA();
testB();
testC();
}
Another way to do it would be to use some linker tricks. This is what the Linux kernel does for its initialization. Functions that are init code are marked with the qualifier __init. This is defined in linux/init.h as follows:
#define __init __section(.init.text) __cold notrace
This causes the linker to put that function in the section .init.text. The kernel will reclaim memory from that section after the system boots.
For calling the functions, each module will declare an initcall function with some other macros core_initcall(func), arch_initcall(func), et cetera (also defined in linux/init.h). These macros put a pointer to the function into a linker section called .initcall.
At boot-time, the kernel will "walk" through the .initcall section calling all of the pointers there. The code that walks through looks like this:
extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[];
static void __init do_initcalls(void)
{
initcall_t *fn;
for (fn = __early_initcall_end; fn < __initcall_end; fn++)
do_one_initcall(*fn);
/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();
}
The symbols __initcall_start, __initcall_end, etc. get defined in the linker script.
In general, the Linux kernel does some of the cleverest tricks with the GCC pre-processor, compiler and linker that are possible. It's always been a great reference for C tricks.
You really need static linking and, at the same time, to select all matching functions at runtime, right? Because the latter is a typical case for dynamic linking, i'd say.
You obviusly need some mechanism to register the available functions. Dynamic linking would provide just this.
I really don't think you can do it. C isn't exactly capable of late-binding or the sort of introspection you seem to be requiring.
Although I don't really understand your question. Do you want the features of dynamically linked libraries while statically linking? Because that doesn't make sense to me... to static link, you need to already have the binary in hand, which would make dynamic loading of functions a waste of time, even if you could easily do it.
I want to know how a static variable or function is protected to be used only for the file it is defined in. I know that such variables and functions are declared in data section (heap area to be precise), but is it tagged with the file name ? Suppose I make a fool of the compiler by assigning such a static function (defined in foo.c) to a global function pointer, and call that function pointer in some other file (bar.c). Obviously my code wont give any compilation warning, but incidentally, it gives segmentation fault. Obviously, it is a protection fault, but I am interested in knowing how it is implemented inside the system.
Thanks. MS
The linker takes care of restricting the scope of mapping the function name to the function.
There is no protection for static functions called by function pointer - it's not that uncommon an idiom. For example, the recommended way of implementing GObject methods is to expose a pointer to a static function (see the virtual public methods section in this GObject how-to)
It is 'protected' simply by not having its symbol/location made known to the linker. So you cannot write code in another module that explicitly references the static object by its symbol name, because the linker has no such symbol. There is no run-time protection.
If you pass an address to a static object to some other module at runtime, then you will then be able to access it through such a pointer. That is not "making a fool of the compiler" (or linker in fact), such action may be entirely legitimate.
The fact that you got a seg-fault is probably for an entirely different reason (an invalid pointer for example). The compiler may choose to in-line the code in which case a pointer to it would not be be possible, but if you explicitly take the address of an object, the compiler should instantiate it, so this seems unlikely.
The purpose of static is not to 'protect' the variable/function but to protect the namespace and protect the rest of your program from having its behavior messed up by symbols with conflicting names. It also allows a good bit more optimization in that the compiler knows it doesn't have to facilitate access to the symbol name by outside modules.
you "may" get a problem if foo.c and bar.c are compiled into different dynamic loaded libraries.