How to create static linked shared libraries

How to create static linked shared libraries - c

For my master's thesis i'm trying to adapt a shared library approach for an ARM Cortex-M3 embedded system. As our targeted board has no MMU I think that it would make no sense to use "normal" dynamic shared libraries. Because .text is executed directly from flash and .data is copied to RAM at boot time I can't address .data relative to the code thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
From the book "Linkers and Loaders" I got aware of "static linked shared libraries, that is, libraries where program and data addresses in libraries are bound to executables at link time". The linked chapter describes how such libraries could be created in general and gives references to Unix System V, BSD/OS; but also mentions Linux and it's uselib() system call. Unfortunately the book gives no information how to actually create such libraries such as tools and/or compiler/linker switches. Apart from that book I hardly found any other information about such libraries "in the wild". The only thing I found in this regard was prelink for Linux. But as this operates on "normal" dynamic libraries thats not really what I'm searching for.
I fear that the use of these kind of libaries is very specific, so that no common tools exists to create them. Although the mentioned uselib() syscall in this context makes me wondering. But I wanted to make sure that I haven't overlooked anything before starting to hack my own linker... ;) So could anyone give me more information about such libraries?
Furthermore I'm wondering if there is any gcc/ld switch which links and relocates a file but keeps the relocation entries in the file - so that it could be re-relocated? I found the "-r" option, but that completely skips the relocation process. Does anyone have an idea?
edit:
Yes, I'm also aware of linker scripts. With gcc libfoo.c -o libfoo -nostdlib -e initLib -Ttext 0xdeadc0de I managed to get some sort of linked & relocated object file. But so far I haven't found any possibility to link a main program against this and use it as shared library. (The "normal way" of linking a dynamic shared library will be refused by the linker.)

Concepts
Minimum concept of what such a shared library maybe about.
same code
different data
There are variations on this. Do you support linking between libraries. Are the references a DAG structure or fully cyclic? Do you want to put the code in ROM, or support code updates? Do you wish to load libraries after a process is initially run? The last one is generally the difference between static shared libraries and dynamic shared libraries. Although many people will forbid references between libraries as well.
Facilities
Eventually, everything will come down to the addressing modes of the processor. In this case, the ARM thumb. The loader is generally coupled to the OS and the binary format in use. Your tool chain (compiler and linker) must also support the binary format and can generate the needed code.
Support for accessing data via a register is intrinsic in the APCS (the ARM Procedure calling standard). In this case, the data is accessed via the sb (for static base) which is register R9. The static base and stack checking are optional features. I believe you may need to configure/compile GCC to enable or disable these options.
The options -msingle-pic-base and -mpic-register are in the GCC manual. The idea is that an OS will initially allocate separate data for each library user and then load/reload the sb on a context switch. When code runs to a library, the data is accessed via the sb for that instances data.
Gcc's arm.c code has the require_pic_register() which does code generation for data references in a shared library. It may correspond to the ARM ATPCS shared library mechanics.See Sec 5.5
You may circumvent the tool chain by using macros and inline assembler and possibly function annotations, like naked and section. However, the library and possibly the process need code modification in this case; Ie, non-standard macros like EXPORT(myFunction), etc.
One possibility
If the system is fully specified (a ROM image), you can make the offsets you can pre-generate data offsets that are unique for each library in the system. This is done fairly easily with a linker script. Use the NOLOAD and put the library data in some phony section. It is even possible to make the main program a static shared library. For instance, you are making a network device with four Ethernet ports. The main application handles traffic on one port. You can spawn four instances of the application with different data to indicate which port is being handled.
If you have a large mix/match of library types, the foot print for the library data may become large. In this case you need to re-adjust the sb when calls are made through a wrapper function on the external API to the library.
void *__wrap_malloc(size_t size) /* Wrapped version. */
{
/* Locals on stack */
unsigned int new_sb = glob_libc; /* accessed via current sb. */
void * rval;
unsigned int old_sb;
volatile asm(" mov %0, sb\n" : "=r" (old_sb);
volatile asm(" mov sb, %0\n" :: "r" (new_sb);
rval = __real_malloc(size);
volatile asm(" mov sb, %0\n" :: "r" (old_sb);
return rval;
}
See the GNU ld --wrap option. This complexity is needed if you have a larger homogenous set of libraries. If your libraries consists of only 'libc/libsupc++', then you may not need to wrap anything.
The ARM ATPCS has veneers inserted by the compiler that do the equivalent,
LDR a4, [PC, #4] ; data address
MOV SB, a4
LDR a4, [PC, #4] ; function-entry
BX a4
DCD data-address
DCD function-entry
The size of the library data using this technique is 4k (possibly 8k, but that might need compiler modification). The limit is via ldr rN, [sb, #offset], were ARM limits offset to 12bits. Using the wrapping, each library has a 4k limit.
If you have multiple libraries that are not known when the original application builds, then you need to wrap each one and place a GOT type table via the OS loader at a fixed location in the main applications static base. Each application will require space for a pointer for each library. If the library is not used by the application, then the OS does not need to allocate the space and that pointer can be NULL.
The library table can be accessed via known locations in .text, via the original processes sb or via a mask of the stack. For instance, if all processes get a 2K stack, you can reserve the lower 16 words for a library table. sp & ~0x7ff will give an implicit anchor for all tasks. The OS will need to allocate task stacks as well.
Note, this mechanism is different than the ATPCS, which uses sb as a table to get offsets to the actual library data. As the memory is rather limited for the Cortex-M3 described it is unlikely that each individual library will need to use more than 4k of data. If the system supports an allocator this is a work around to this limitation.
References
Xflat technical overview - Technical discussion from the Xflat authors; Xflat is a uCLinux binary format that supports shared libraries. A very good read.
Linkage table and GOT - SO on PLT and GOT.
ARM EABI - The normal ARM binary format.
Assemblers and Loader, by David Solomon. Especially, pg262 A.3 Base Registers
ARM ATPCS, especially Section 5.5, Shared Libraries, pg18.
bFLT is another uCLinux binary format that supports shared libraries.

How much RAM do you have attached? Cortex-M systems have only a few dozen kiB on-chip and for the rest they require external SRAM.
I can't address .data relative to the code
You don't have to. You can place the library symbol jump table in the .data segment (or a segment that behaves similarly) at a fixed position.
thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
Nothing prevents you from having a second GOT placed at a fixed location, that's writable. You have to instruct your linker where and how to create it. For this you give the linker a so called "linker script", which is kind of a template-blueprint for the memory layout of the final program.

I'll try to answer your question before commenting about your intentions.
To compile a file in linux/solaris/any platform that uses ELF binaries:
gcc -o libFoo.so.1.0.0 -shared -fPIC foo1.c foo2.c foo3.c ... -Wl,-soname=libFoo.so.1
I'll explain all the options next:
-o libFoo.so.1.0.0
is the name we are going to give to the shared library file, once linked.
-shared
means that you have a shared object file at end, so there can be unsolved references after compilation and linked, that would be solved in late binding.
-fPIC
instructs the compiler to generate position independent code, so the library can be linked in a relocatable fashion.
-Wl,-soname=libFoo.so.1
has two parts: first, -Wl instructs the compiler to pass the next option (separated by comma) to the linker. The option is -soname=libFoo.so.1. This option, tells the linker the soname used for this library. The exact value of the soname is free style string, but there's a convenience custom to use the name of the library and the major version number. This is important, as when you do static linking of a shared library, the soname of the library gets stuck to the executable, so only a library with that soname can be loaded to assist this executable. Traditionally, when only the implementation of a library changes, we change only the name of the library, without changing the soname part, as the interface of the library doesn't change. But when you change the interface, you are building a new, incompatible one, so you must change the soname part, as it doesn't get in conflict with other 'versions' of it.
To link to a shared library is the same than to link to a static one (one that has .a as extension) Just put it on the command file, as in:
gcc -o bar bar.c libFoo.so.1.0.0
Normally, when you get some library in the system, you get one file and one or two symbolic links to it in /usr/lib directory:
/usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so.1 --> /usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so --> /usr/lib/libFoo.so.1
The first is the actual library called on executing your program. The second is a link with the soname as the name of the file, just to be able to do the late binding. The third is the one you must have to make
gcc -o bar bar.c -lFoo
work. (gcc and other ELF compilers search for libFoo.so, then for libFoo.a, in /usr/lib directory)
After all, there's an explanation of the concept of shared libraries, that perhaps will make you to change your image about statically linked shared code.
Dynamic libraries are a way for several programs to share the functionalities of them (that means the code, perhaps the data also). I think you are a little disoriented, as I feel you have someway misinterpreted what a statically linked shared library means.
static linking refers to the association of a program to the shared libraries it's going to use before even launching it, so there's a hardwired link between the program and all the symbols the library has. Once you launch the program, the linking process begins and you get a program running with all of its statically linked shared libraries. The references to the shared library are resolved, as the shared library is given a fixed place in the virtual memory map of the process. That's the reason the library has to be compiled with the -fPIC option (relocatable code) as it can be placed differently in the virtual space of each program.
On the opposite, dynamic linking of shared libraries refers to the use of a library (libdl.so) that allows you to load (once the program is executing) a shared library (even one that has not been known about before), search for its public symbols, solve references, load more libraries related to this one (and solve recursively as the linker could have done) and allow the program to make calls to symbols on it. The program doesn't even need to know the library was there on compiling or linking time.
Shared libraries is a concept related to the sharing of code. A long time ago, there was UNIX, and it made a great advance to share the text segment (whit the penalty of not being able for a program to modify its own code) of a program by all instances of it, so you have to wait for it to load just the first time. Nowadays, the concept of code sharing has extended to the library concept, and you can have several programs making use of the same library (perhaps libc, libdl or libm) The kernel makes a count reference of all the programs that are using it, and it just gets unloaded when no other program is using it.
using shared libraries has only one drawback: the compiler must create relocatable code to generate a shared library as the space used by one program for it can be used for another library when we try to link it to another program. This imposes normally a restriction in the set of op codes to be generated or imposes the use of one/several registers to cope with the mobility of code (there's no mobility but several linkings can make it to be situated at different places)
Believe me, using static code just derives you to making bigger executables, as you cannot share effectively the code, but with a shared library.

Related

how to make shared library an executable

I was searching for asked question. i saw this link https://hev.cc/2512.html which is doing exactly the same thing which I want. But there is no explanation of whats going on. I am also confused whether shared library with out main() can be made executable if yes how? I can guess i have to give global main() but know no details. Any further easy reference and guidance is much appreciated
I am working on x86-64 64 bit Ubuntu with kernel 3.13

This is fundamentally not sensible.
A shared library generally has no task it performs that can be used as it's equivalent of a main() function. The primary goal is to allow separate management and implementation of common code operations, and on systems that operate that way to allow a single code file to be loaded and shared, thereby reducing memory overhead for application code that uses it.
An executable file is designed to have a single point of entry from which it performs all the operations related to completing a well defined task. Different OSes have different requirements for that entry point. A shared library normally has no similar underlying function.
So in order to (usefully) convert a shared library to an executable you must also define ( and generate code for ) a task which can be started from a single entry point.
The code you linked to is starting with the source code to the library and explicitly codes a main() which it invokes via the entry point function. If you did not have the source code for a library you could, in theory, hack a new file from a shared library ( in the absence of security features to prevent this in any given OS ), but it would be an odd thing to do.
But in practical terms you would not deploy code in this manner. Instead you would code a shared library as a shared library. If you wanted to perform some task you would code a separate executable that linked to that library and code. Trying to tie the two together defeats the purpose of writing the library and distorts the structure, implementation and maintenance of that library and the application. Keep the application and the library apart.

I don't see how this is useful for anything. You could always achieve the same functionality from having a main in a separate binary that links against that library. Making a single file that works as both is solidly in the realm of "silly computer tricks". There's no benefit I can see to having a main embedded in the library, even if it's a test harness or something.
There might possible be some performance reasons, like not having function calls go through the indirection of the PLT.
In that example, the shared library is also a valid ELF executable, because it has a quick-and-dirty entry-point that grabs the args for main from where the ABI says they go (i.e. copies them from the stack into registers). It also arranges for the ELF interpreter to be set correctly. It will only work on x86-64, because no definition is provided for init_args for other platforms.
I'm surprised it actually works; I thought all the crap the usual CRT (startup) code does was actually needed for stdio to work properly. It looks like it doesn't initialize extern char **environ;, since it only gets argc and argv from the stack, not envp.
Anyway, when run as an executable, it has everything needed to be a valid dynamically-linked executable: an entry-point which runs some code and exits, an interpreter, and a dependency on libc. (ELF shared libraries can depend on (i.e. link against) other ELF shared libraries, in the same way that executables can).
When used as a library, it just works as a normal library containing some function definitions. None of the stuff that lets it work as an executable (entry point and interpreter) is even looked at.
I'm not sure why you don't get an error for multiple definitions of main, since it isn't declared as a "weak" symbol. I guess shared-lib definitions are only looked for when there's a reference to an undefined symbol. So main() from call.c is used instead of main() from libtest.so because main already has a definition before the linker looks at libtest.

To create shared Dynamic Library with Example.
Suppose with there are three files are : sum.o mul.o and print.o
Shared library name " libmno.so "
cc -shared -o libmno.so sum.o mul.o print.o
and compile with
cc main.c ./libmno.so

How can I "dump" a Function to a file?

For example, I have a function func():
int func (int a, int b) {return a + b;}
Now I want write it to a file, so that I can use the system-call mmap to load it with PROT_EXEC and I can call it from another program.What should I do for it?

If you know what signature you need and a static library or the location of a shared library at compile time, you probably just want to include the header and link against the output library. If you want to invoke a function dynamically, you probably want dlopen / dlsym (UNIX) or LoadLibrary / GetProcAddress (Windows) for loading the libary dynamically and retrieving the address of the function by name.
Note that the cases where you actually need to load a library dynamically (at least explicitly) are pretty rare. This is often used for modular architectures (e.g. "plugins" or "extensions") where individual pieces of the application are distributed separately (which can be achieved more securely using IPC rather than dynamic loading... see my note below). Or for cases where your application is not allowed to include dependencies statically and needs to conditionally supply behavior based on the existence of certain library dependencies in the environment in which it happens to be executing. In most cases, though, you'll simply want to include a header that declares the symbols you need and compile for each target platform (possibly using #if...#else macros if there are symbols that vary across OSes or OS versions).
From a stability, security, and code complexity standpoint, I personally recommend that you avoid dynamic library loading. For core system functionality, it's reasonable to link against a dynamic library, but you'll want to do it in a way where the burden of dynamic loading is entirely on your toolchain (i.e. you shouldn't need to call dlopen or LoadLibrary explicitly). For other functionality, it is almost always better to statically link (assuming you distribute updates when there are security fixes for your dependencies), since this will avoid you getting broken by incompatible version updates and also prevent your users from experiencing dependency hell (you require version A but some other application requires version B); modular architectures are often better (and more securely) achieved through inter-process communication (IPC), since dynamically loaded libraries live in the process of the program that loads them (thereby giving them access to the entire process's virtual memory space), whereas with interprocess-communication, each component would be a separate process, and individual components would only have access to information that was given to it explicitly by the calling process, which would make it more difficult for a malicious component to steal data from the caller or other components or to produce instability.

The sanest thing if you want this to actually be used in the real world is probably to just compile the source as part of your program on each platform, like a regular function.
Next best is probably a separate process that you talk to rather than merge with.
Semi-sane (but still not a great choice, see our discussion in the other answer) would be making the shared library, like Michael Aaron Safyan said.
But if you want to know how it works just because - say, you want to write your own dynamic linker, or are doing some kind of runtime code generation like a JIT compiler, or if you just wanna know - you can make a raw code file.
To use it, what we'd have to do is similar to what the linker does - load the code at a particular address that it is made to work on and run it. There is position independent code that can run at any address, too.
Let's first get our function compiled and linked, then output into a raw image for a certain address. Assume the function is func in the file func.c and we're using gcc on Linux. (A Windows compiler would have similar options - gcc on Windows is exactly the same, I believe, but something like Digital Mars's C compiler does it differently with the linker command being /BINARY for instance)
Anyway, here's what I ran:
gcc -c func.c # makes func.o
ld func.o --oformat=binary -e func -o func.binary
This generates a file called func.binary. You can disassemble it most easily with ndisasm -b 64 func.binary (or -b 32 if you compiled the C in 32 bit mode) to confirm it looks right - I see an add instruction there, so looks good to me.
If you loaded that and mmaped then called it... it should work.
Problems will be quick to come up though:
If there's more than one function in that file, they'll all be squished together.
The addresses they try to use to call each other may be totally wrong.
Global variables and other static data will be messed up.
And there's more. The operating system uses more complex file formats for executables and libraries for a reason!
To go to the next step, you could consider writing an ELF or PE loader which reads that metadata off a standard file. Of course, once you get into much of this, you'll be doing exactly what the OS provides with dlopen and LoadLibrary.... so unless the goal is to just learn about the guts, just call those functions and call it done!

Visibility, Fortran common variables, runtime loading of shared libraries

Environment: Intel Linux, Red Hat 5.
Compiler: gcc 3.4.6
(old stuff, legacy environment with serious infrastructure, sorry)
I have multiple versions of a particular shared library (call it something like "shared_lib.so") derived from Fortran which contains a COMMON block and various computations with references to variables in that COMMON.
I need to be able to (from C code elsewhere in the end-product executable) use dlclose() and dlopen() to switch between versions of this library (within which all versions of the COMMON contents are identical) while running. In some cases the same COMMON also appears in code which is part of a static library (call it "static_lib.a") that is also linked into the executable, and is separately maintained from my project but which has functionality which interacts with that in my shared library.
I appear to be seeing that multiple instances of the COMMON wind up in the executable, and (more importantly) that there is no linkage between the values of variables in the instance from the static library, and the values of the “same” variables in the instance from a shared library pulled in with dlopen().
What I need, in summary, is (within the overall executable) for a dlopen()-loaded shared_lib.so to be able to set/use variable XYZ in COMMON ABC, and for code in static_lib.a to set/use XYZ, and have it in effect be the same instance of XYZ, or at least for the two to be kept in synch. Is this possible?
My compilation commands for sources in shared_lib.so are of the form:
g77 –c –g –m32 -fPIC –o shared_src.o shared_src.f
My command for building shared_lib.so is of the form:
gcc -g -m32 -fPIC -shared -o shared_lib.so *.o
My command for building the executable is of the form:
gcc –g -m32 –rdynamic –o exec exec.o static_lib.a shared_lib.so –lm –ldl –lg2c
My need is to do something from the C code of the form:
handle1 = dlopen ("shared_lib.so", RTLD_NOLOAD);
dlclose (handle1);
handle2 = dlopen ("shared_lib2.so", RTLD_NOW | RTLD_GLOBAL);
...
The initial startup configuration does appear to function correctly with respect to the needed variables, but the result of subsequent dlclose() and dlopen() sequences do not. Perhaps the underlying issue is that dlopen() lacks some intelligence that gcc possesses when it is linking.

Short answer
Did/can you recompile the executable with the -fPIC? I found that it was necessary to compile both the shared library AND the executable with the -fPIC to get the COMMON blocks to be recognized properly.
Long answer
I ran into a slightly similar problem recently with COMMON blocks shared between an executable and a FORTRAN shared library. However, I'm using Intel compilers NOT the GNU compilers. The executable is mixed C/C++ and FORTRAN.
The existing (working) Windows version of the code works by sharing the common blocks between executable and DLL through DLLEXPORT/DLLIMPORT ATTRIBUTE directives. According to the Intel compiler documentation, these attribute directives are not recognized in Linux. Indeed, the Linux Intel compiler just produces warnings for these directives.
The main changes in converting the code from Windows to Linux were replacing the Windows LoadLibrary and GetProcAddress with Linux's dlopen and dlsym routines, respectively, using #ifdef sections. The shared library was compiled using -fpic and linked with -shared.
While the shared library was compiled with -fpic, the executable was NOT. When running the code compiled in this manner, variables passed to the shared library through subroutine calls were passed properly, however, the COMMON block variables were not set correctly (or were uninitialized).
In desperation, I finally tried compiling the executable itself with the -fpic compiler option, and then the COMMON blocks were recognized properly in the shared library.

This isn't really an answer, but might help you.
Here's what the 2008 standard says about COMMON:
5.7.2.4 Common association
1 Within a program, the common block storage sequences of all nonzero-sized common blocks with the same
name have the same first storage unit, and the common block storage
sequences of all zero-sized common blocks with the same name are
storage associated with one another. Within a program, the common
block storage sequences of all nonzero-sized blank common blocks have
the same first storage unit and the storage sequences of all
zero-sized blank common blocks are associated with one another and
with the first storage unit of any nonzero-sized blank common blocks.
This results in the association of objects in different scoping units.
Use or host association may cause these associated objects to be
accessible in the same scoping unit.
In short, COMMON sections with the same name in the same program occupy the same storage.
A program is defined as follows.
2.2.2 Program
1 A program shall consist of exactly one main program, any number (including zero) of other kinds of program units, any
number (including zero) of external procedures, and any number
(including zero) of other entities defined by means other than
Fortran. The main program shall be defined by a Fortran main-program
program-unit or by means other than Fortran, but not both.
The standard doesn't say anything about static vs dynamic linking and it doesn't restrict the previous statements to static linking. Therefore, it seems the dynamically loaded library should share the COMMON block with the main program (which I'm not sure is even technically possible) and thus the GNU implementation is incorrect.
On the other hand, the standard also doesn't say anything about being able to load libraries dynamically. Program units "defined by means other than Fortran" should include C libraries, but that doesn't tell us how these program units are connected to the main program. Fortran, in general, is not a very dynamic language.
Of course, you can work around all this by simply not using COMMON blocks. If a procedure needs to read/write some data, just pass it as a parameter with intent in/out. You can also group data together in a derived type and pass it around together as a unit. Nowadays (Fortran 2003+), you can even use object oriented programming, so there is really no need for global variables anymore.

Convert a statically linked elf binary to dynamically linked

I have a elf binary which has been statically linked to libc.
I do not have access to its C code.
I would like to use OpenOnload library, which has implementation of sockets in user-space and therefore provides lower latency compared to standard libc versions.
OpenOnload implements standard socket api, and overrides libc version using LD_PRELOAD.
But, since this elf binary is statically linked, it cannot use the OpenOnload version of the socket API.
I believe that converting this binary to dynamically link with OpenOnload is possible, with the following steps:
Add new Program headers: PT_INTERP, PT_DYNAMIC and PT_LOAD.
Add entries in PT_DYNAMIC to list dependency with libc.
Add PLT stubs for required libc functions in the new PT_LOAD section.
Modify existing binary code for libc functions to jump to corresponding PLT stubs.
As a first cut, I tried just adding 3 PT_LOAD segments. New segment headers were added after existing PT_LOAD segment headers. Also, vm_addr of existing segments was not modified. File offsets of existing segments were shifted below to next aligned address based on p_align.
New PT_LOAD segments were added in the file at end of the file.
After re-writing the file, when I ran it, it was loaded properly by the kernel, but then it immediately seg-faulted.
My questions are:
If I just shift the file-offsets in the elf binary, without modifying the vm_addresses, can it cause any error while running the binary?
Is it possible to do what I am attempting? Has anybody attempted it?

What you are attempting is not possible in any automated way. At the time of static linking, all relocation information identifying calls to libc as calls to libc has been resolved and removed. If debugging symbols exist in the binary, it's possible to identify "this range of bytes in the text segment corresponds to such-and-such libc function", but there is no way to identify references to the function, which will be embedded in the instruction byte stream with no markup to identify them. You could use heuristics based on disassembly, but they would be incomplete and unreliable (possibility of both false negatives and false positives).
As far as shifting offsets, you absolutely cannot change anything about the load addresses for a static linked binary. If you need to insert headers before the load segments, you'd have to insert a whole page, and update the file offsets in the program header table (adding 1 page to them) while leaving the virtual address load offsets the same. However, since what you're trying to do is not possible overall, the offset-shifting issue is the least of your worries.
Perhaps, if the program doesn't require high performance, you could run it under qemu app-level emulation, with qemu going through the sockets emulation/wrapper.

OpenOnload ... overrides libc version using LD_PRELOAD. But, since this elf binary is statically linked ...
also answered in intercept system calls of statically linked executable

Prototype Kernel and modules

Recently I've picked up one of my old projects and restarted it, pretty much from scratch.
I've been sick for awhile, so I've had time to crack down hard and implement tons of functionality. However one thing that I feel would be a good idea to implement is module loading. I want to do kernel mode dynamic loading of modules.
The word modules is a bit ambiguous, the correct term would just be to load libraries, such as a miniture implementation of the C library for kernel mode drivers or standard things like the PIT and keyboard which are on IRQ 0 & 1. The method I'm trying to achieve is a bit self-sustaining; in the aspect that the modules my kernel will load, will be used in the kernel itself to get into user mode.
As an example, my kernel uses very few functions from the C library, which I've implemented myself. These functions themselfs are used in the setup of my GDTs, IDTs, IRQs, ISRs etc, etc. I would like to abstract these functions to a library that the kernel can load and use. Which means the kernel itself will require module loading at the very first stage, before anything is setup.
Now, I've thought of a few ways to do this myself, such as adding a structure to this library with a table of function pointers that are assigned the address of the functions in the library itself. Compiling the library as an aout-kludge file, loading the library into the kernel as a void * ( which is okay since I have a working allocator ), and then figuring out the offset of the structure, stepping into the void pointer that much, and recreating the structure in the kernel. This does not sound like it would work, since the table of function pointers need to be assigned, which means there needs to be an initialize function in the library itself. How would that be called, even if I knew the address?
I'm clueless as to how I could implement such a loader, and is it even worth it? I want to abstract as much as I possibly can, my kernel has a modular design. I also do expect to load drivers and other things with this method, I'm just unsure how I would implement it. I tried various methods already, and they all failed. What should I do?

I would recommend you first write a dynamic loader in user space. The techniques needed are very similar, and you may be able to adapt much of the code to kernel space later. Also, don't use a.out and don't make up your own 'table of function pointers' - use a more modern format such as ELF. The compile-time tools already exist, so this will save you a lot of effort; you can just write an appropriate linker script and build straight from a Linux GCC.
As it happens, the Windows kernel does something very similar to what you say - the Windows kernel (ntoskrnl.exe) is a PE executable file linking in routines from various DLLs (PSHED.dll, HAL.dll, KDCOM.dll, CLFS.sys, and Cl.dll on my system). In this case, the NTLDR program loads all files required by ntoskrnl.exe into memory, and a boot stub in ntoskrnl.exe then performs dynamic linking. Later the same dynamic linker can be used to load other drivers as well.

Implementing kernel modules is not a trivial job. It is a little complicated and you will need to read to ELF documentation for coding. I will try to provide you some insight here -
In user-space, executable files required shared libraries to implement some of their functionality or code. So, the code in the executable will reference code in the shared library. This leads to the development of symbols. Symbols represent pointers to data/functions/other & have a name.
CHAR VariableName[20];
For example, in the above code a data symbol will be created with the name 'VariableName'. After the 'invention' of dynamic/shared libraries the symbol table (set of symbols in a binary) has to be loaded to resolve the references from executable file in the libraries. But a lot of debugging symbols & useless symbols are present in the symbol table.
Symbol: Main.c
For example, in the symbol table, even a symbol for the C source file will be present for debugging. But that is not required for resolving references at runtime. Here, the concept of dynamic symbols comes in the play.
Dynamic linking refers to the resolving of references between binaries. The dynamic linker will use the dynamic symbol table (which must be loaded & 'normal' symbol table doesn't have to) to resolve the references that the executable makes in the library.
Now, in the kernel core which is the executable file, there are no references in the shared libraries (kernel modules). But the shared libraries reference in the executable. So, the executable must contain dynamic symbols for resolving the references in the kernel modules. This is contrary to the case in user-space. Thus, if you are using ld,
-pie -T LinkerScript.ld
option should be used for create a dynamic symbol table in the kernel executable.
And you should create a LinkerScript.ld file -
/* File: LinkerScript.ld */
PHDRS {
kernel PT_LOAD FILEHDR;/* This declares a segment in which your code/data is.*/
dynamic PT_DYNAMIC;/* Segment containing the dynamic table (not DST). */
}
SECTIONS {
/* text, data, bss sections must be implemented already */
.dynamic ALIGN(0x1000) : AT(ADDR(.dynamic) - KERNEL_OFFSET)
{
*(.dynamic)
} :dynamic/* add :kernel to text, data, bss*/
}
with the above structure. Make sure, your .text, .data & .bss sections are already present & :kernel is added to the end of the section descriptor.
For more information, read the ELF documentation & LD manual (for linker script insight).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight