Is there a good way to check for OS version (in this case Windows Vista+ or not) and decide at runtime what version of a function is going to be used.
Concretely I am talking about implementing pthreads in Win32 threads. In my ideal case, the pthreads library would determine at program startup which OS is running. If it is Vista+, all function calls will be redirected to the cool new and fast functions, otherwise, the old emulation layer will be used.
So in effect, the library will have two version of each function, one new and one old. And a one-time runtime check would determine at runtime, before the program enters main so to speak, which version it's going to use. I know there's libraries that detect CPU features like SSE at runtime, and use the relevant functions, but I think they check at each function call. That would be too expensive to do in a low-level threading library IMO.
Is this possible? Can function calls be "relinked"/redirected at runtime so speak?
EDIT: crazy things like custom crt startup code would be possible for this (I'm talking about winpthreads for mingw-w64, which provides its own startup code)
The simple answer? Define and build a dispatch table/structure for your library. Something like this:
// Define function pointers and dispatch structure.
typedef void( *PFN_pthread_exit )( void *value_ptr );
typedef struct tag_PTHREAD_IMPL
{
PFN_pthread_create ptr_pthread_exit;
// Add the rest rest here.
} PTHREAD_IMPL;
// Define your various implementations dispatcher structures.
static PTHREAD_IMPL legacy_impl = {
&legacy_pthread_exit_impl
};
static PTHREAD_IMPL latest_andgreatest_impl = {
&pthread_exit_impl
};
static PTHREAD_IMPL* s_pImpl = NULL;
Next, your library's initialize function should contain something like this:
int StaticInitialize( )
{
// Initalize dispatcher
if( latest and greatest OS version )
s_pImpl = &latest_andgreatest_impl
else
s_pImpl = &legacy_impl;
}
Finally, your libraries exported functions should look something like this:
int pthread_exit( void *value_ptr )
{
ASSERT( s_pImpl );
ASSERT( s_pImpl->ptr_pthread_exit );
return s_pImpl->ptr_pthread_exit( value_ptr );
}
Naturally, you'll need to ensure that your modern implementations utilize runtime binding for exports that don't exist on legacy platforms.
Have fun!
Related
I am making a low level library that requires initialization to work properly which I implemented with a init function. I am wondering if there is a way to make the init call be called once the user calls a library function ideally without:
Any overhead
No repeated calls
No exposed global variables. (my current solution does this, which I don't quite like)
my current solution as per comment request:
bool isinit = 0;
void init()
{
isinit = 1;
// init code
}
void lib_function()
{
if(!isinit) init();
// function code
}
The compiler seems to be smart enough (using -0fast on gcc) to not make that comparison each time a lib_function is called, but this still exposes a global variable which I don't like.
Best way to abstract away an init function?
Surely your library has some state. Typically, a library exposes functions that work on a specific structure. Do not use global variables - do not write spaghetti code. Expose the structure that holds the state of your library, and make all functions of your library take a pointer to the structure as an argument. Use a namespace - prepend all exported symbols with a prefix. An init function is just like int lib_init(struct lib_the_struct *t); - it will be self-understandable that users need to initialize the structure with that function before use. For example: fopen(), pthread_create.
Write an init function in your library. Write clear documentation stating, that the user of your library has to call the function once before calling any other function. For example: https://curl.se/libcurl/c/curl_global_init.html .
If you're happy with a solution that is a common extension rather than part of the C standard, you can mark your init function with the constructor attribute, which ensures it will be called automatically during program initialization (or during shared library load if you eventually end up using that).
I would fix this with assert so that the if will dissappear in release build and if you forget to call the init_function somewhere you get the error while developing.
Also turn isinit into a static so every library can have its own variable with the same name.
#include <assert.h>
#ifndef NDEBUG
static int isinit = 0;
#endif
void lib_function()
{
assert(isinit && "library: init not called");
}
There will be overhead if you run if(!isinit) init(); each time you call a function. At least an extra branch.
As for removing global variables, do in your example but static bool isinit = 0;. This reduces the scope of the variable to the local translation unit (.c file and all .h files it includes). It's no longer "global". Note that this isn't ideal in multi-threaded scenarios - you will have to protect the variable with a mutex then.
Overall though, what you are trying to do isn't a good idea. It is very common convention for C libraries to have an init function and the user of the library is expected to call it before calling anything else or they are to blame, not your library. Naturally you have to make this clear to them with source code documentation. It is common to have a list of prerequisites in source code comments together with every function declaration placed in the header file of the library.
Let's assume, I have a C structure, DynApiArg_t.
typedef struct DynApiArg_s {
uint32_t m1;
...
uint32_t mx;
} DynApiArg_t;
The pointer of this struct is passed as an arg to a function say
void DynLibApi(DynApiArg_t *arg)
{
arg->m1 = 0;
another_fn_in_the_lib(arg->mold); /* May crash here. (1) */
}
which is present in a dynamic library, libdyn.so. This API is invoked from an executable via a dlopen/dlsym procedure of invocation.
In case this dynamic library is updated to version 2, where DynApiArg_t now has new member, say m2, as below:
typedef struct DynApiArg_s {
uint32_t m1;
OldMbr_t *mold;
...
uint32_t mx;
uint32_t m2;
NewMbr *mnew;
} DynApiArg_t;
Without a complete rebuild of the executable or other libs that call this API via a dlopen/dlsym, everytime this API is invoked, I see the process crashing, due to the some dereference of any member in the struct. I understand accessing m2 may be a problem. But access to member mold like below is seen causing crashes.
typedef void (*fnPtr_t)(DynApiArg_t*);
void DynApiCaller(DynApiArg_t *arg)
{
void *libhdl = dlopen("libdyn.so", RTLD_LAZY | RTLD_GLOBAL);
fnPtr_t fptr = dlsym(libhdl, "DynLibApi");
fnptr(arg); /* actual call to the dynamically loaded API (2) */
}
In the call to the API via fnptr, at line marked (2), when the old/existing members (in v1 of lib, when DynApiCaller was initially compiled) is accessed at (1), it happens to be any garbage value or even NULL at times.
What is the right way to handle such updates without a complete recompilation of the executable everytime the dependant libs are updated?
I've seen libs being named with symliks with version numbers like libsolid.so.4. Is there something related to this versioning system that can help me? If so can you point me to right documentations for these if any?
There are a number of approaches to solve this problem:
Include the API version in the dynamic library name.
Instead of dlopen("libfoo.so"), you use dlopen("libfoo.so.4"). Different major versions of the library are essentially separate, and can coexist on the same system; so, the package name for that library would be e.g. libfoo-4. You can have libfoo.so.4 and libfoo.so.5 installed at the same time. Minor versions, say libfoo-4.2, install libfoo.so.4.2, and symlink libfoo.so.4 to libfoo.so.4.2.
Initially define the structures with zero padding (required to be zero in earlier versions of the library), and have the later versions reuse the padding fields, but keeping the structures the same size.
Use versioned symbol names. This is a Linux extension, using dlvsym(). A single shared library binary can implement several versions of the same dynamic symbol.
Use resolver functions to determine the symbols at load time. This allows e.g. hardware architecture-optimized variants of functions to be selected at run time, but is less useful with a dlopen()-based approach.
Use a structure to describe the library API, and a versioned function to obtain/initialize that API.
For example, version 4 of your library could implement
struct libfoo_api {
int (*func1)(int arg1, int arg2);
double *data;
void (*func2)(void);
/* ... */
};
and only export one symbol,
int libfoo_init(struct libfoo_api *const api, const int version);
Calling that function would initialize the api structure with the symbols supported, with the assumption that the structure corresponds to the specified version. A single shared library can support multiple versions. If a version is not supported, it can return a failure.
This is especially useful for plugin-type interfaces (although then the _init function is more likely to call application-provided functionality registering functions, rather than fill in a structure), as a single file can contain optimized functionality for a number of versions, optimized for a number of compatible hardware architectures (for example, AMD/Intel architectures with different SSE/AVX/AVX2/AVX512 support).
Note that the above implementation details can be "hidden" in a header file, making actual C code using the shared library much simpler. It also helps making the same API work across a number of OSes, simply by changing the header file to use the approach that works best on that OS, while keeping the actual C interface the same.
I am a little confused about how to best handle calling convention differences in a public API and keep it in sync with its bindings. Let's say I am writing a C API, made available through a shared object library or a DLL. Now assume I have been told I should not use the default calling convention on Windows - that is, on Linux and other Unixes I should use the standard calling convention used by the compiler (probably cdecl) but that on Windows I should force the use of stdcall. So I have some #ifdef logic in the headers that sets the right calling convention as needed. The C headers of the library necessarily take care of that, so the C public API is usable.
Now suppose I want to write bindings for my library in another language. That means I have to rewrite the calling convention logic (depending on the current system) in that language too, for the bindings to correctly map to the library. And so on for all bindings. Some languages may not have good (or any) support for this.
Is there a more elegant way to do this? Should I just use the default calling convention everywhere, and assume that other languages will pick the right one for external/imported functions? Do I even need to worry about this stuff (I think so)? Thanks.
Many languages use a built-in or third party library for simplifying calls to shared libraries. These libraries often include support for both calling conventions. One example of this is JNA for invoking native shared libraries from java. Now that being said, if you don't want to rely on other languages using a single calling convention, you can implement the shared library with both types of functions included and have initializers which return the appropriate bindings for each type. For instance, if your library has 2 functions named function1 and function2 you could implement it like this:
typedef struct
{
int (*function1)(int a, int b);
char* (*function2)(void);
}API;
//stdcall implementation
//these functions compiled to use stdcalling convention
int stdcall_function1(int a, int b)
{
/*...*/
}
char* stdcall_function2(void)
{
/*...*/
}
API getSTDCallInstance()
{
API api;
api.function1 = &stdcall_function1;
api.function2 = &stdcall_function2;
return api;
}
//cdecl implementation
//these functions compiled to use cdecl convention
int cdecl_function1(int a, int b)
{
/*...*/
}
char* cdecl_function2()
{
/*...*/
}
API getCDECLInstance()
{
API api;
api.function1 = &cdecl_function1;
api.function2 = &cdecl_function2;
return api;
}
If you implement your library in this manner, then the loading language can use the appropriate initializer to get a handle to the struct containing the correct implementation for them.
Is there any way to programmatically mock a function for a embedded c application, running on linux. In below example I want to mock main to call someBlah instead of someFunc in run-time.
#include <stdio.h>
void someFunc( void )
{
printf("%s():%d\n",__func__,__LINE__);
}
void someBlah( void )
{
printf("%s():%d\n",__func__,__LINE__);
}
int main(void)
{
someFunc();
}
The program will be executing from ram in Linux so text segment should be modifiable. I know GDB works on some similar concept where breakpoints code locations are replaced by trap instructions.
Sure, just make a table of function pointers.
#define BLAH 0
#define FOO 1
void (*table_function[])(void) = {someBlah, someFoo};
If they all have the same interface and return type, you can just switch them by switching table entries.
Then you call a function by performing
table_function[BLAH]();
If you want to swap a function, just say
table_function[BLAH] = otherBlah;
Also: don't do this unless you are writing some kind of JIT-compiling environment or a VM, usually you don't need such constructs and if you need them you are probably having a bad architecture day.
Although if you're experienced in OO design you can design polymorphic constructs in C that way (ignore this if that doesn't make sense).
You could always make some part of the text segment modifiable by an appropriate call to mprotect and overwrite some code with your own (e.g. by generating machine code with libjit, GNU lightning, ... or manually).
But using function pointers is a cleaner way of doing that.
If the functions are inside a shared library, you could even overwrite its Procedure Linkage Table (see also the ABI spec, which depends upon the architecture - here is one for ARM)
There are a few mocking frameworks for C.
At work, we've had some success with cgreen but we did have to make changes to its internals. Luckily, it's quite small, and so relatively easy to extend. An alternative that looks good, but I haven't worked with, is a combination of Unity and CMock.
On the general topic of unit testing embedded C code, I highly recommend Test Driven Development for Embedded C.
Another way I have done this is:
#include <stdio.h>
#define DEBUG
void someFunc( void )
{
#ifndef DEBUG
printf("%s():%d\n",__func__,__LINE__);
#else
printf("%s():%d\n",__func__,__LINE__);
#endif
}
int main(void)
{
someFunc();
}
Take a look at CMocka, there is an article about mocking on LWN: Unit testing with mock objects in C
Is it possible to avoid the entry point (main) in a C program. In the below code, is it possible to invoke the func() call without calling via main() in the below program ? If Yes, how to do it and when would it be required and why is such a provision given ?
int func(void)
{
printf("This is func \n");
return 0;
}
int main(void)
{
printf("This is main \n");
return 0;
}
If you're using gcc, I found a thread that said you can use the -e command-line parameter to specify a different entry point; so you could use func as your entry point, which would leave main unused.
Note that this doesn't actually let you call another routine instead of main. Instead, it lets you call another routine instead of _start, which is the libc startup routine -- it does some setup and then it calls main. So if you do this, you'll lose some of the initialization code that's built into your runtime library, which might include things like parsing command-line arguments. Read up on this parameter before using it.
If you're using another compiler, there may or may not be a parameter for this.
When building embedded systems firmware to run directly from ROM, I often will avoid naming the entry point main() to emphasize to a code reviewer the special nature of the code. In these cases, I am supplying a customized version of the C runtime startup module, so it is easy to replace its call to main() with another name such as BootLoader().
I (or my vendor) almost always have to customize the C runtime startup in these systems because it isn't unusual for the RAM to require initialization code for it to begin operating correctly. For instance, typical DRAM chips require a surprising amount of configuration of their controlling hardware, and often require a substantial (thousands of bus clock cycles) delay before they are useful. Until that is complete, there may not even be a place to put the call stack so the startup code may not be able to call any functions. Even if the RAM devices are operational at power on, there is almost always some amount of chip select hardware or an FPGA or two that requires initialization before it is safe to let the C runtime start its initialization.
When a program written in C loads and starts, some component is responsible for making the environment in which main() is called exist. In Unix, linux, Windows, and other interactive environments, much of that effort is a natural consequence of the OS component that loads the program. However, even in these environments there is some amount of initialization work to do before main() can be called. If the code is really C++, then there can be a substantial amount of work that includes calling the constructors for all global object instances.
The details of all of this are handled by the linker and its configuration and control files. The linker ld(1) has a very elaborate control file that tells it exactly what segments to include in the output, at what addresses, and in what order. Finding the linker control file you are implicitly using for your toolchain and reading it can be instructive, as can the reference manual for the linker itself and the ABI standard your executables must follow in order to run.
Edit: To more directly answer the question as asked in a more common context: "Can you call foo instead of main?" The answer is "Maybe, but but only by being tricky".
On Windows, an executable and a DLL are very nearly the same format of file. It is possible to write a program that loads an arbitrary DLL named at runtime, and locates an arbitrary function within it, and calls it. One such program actually ships as part of a standard Windows distribution: rundll32.exe.
Since a .EXE file can be loaded and inspected by the same APIs that handle .DLL files, in principle if the .EXE has an EXPORTS section that names the function foo, then a similar utility could be written to load and invoke it. You don't need to do anything special with main, of course, since that will be the natural entry point. Of course, the C runtime that was initialized in your utility might not be the same C runtime that was linked with your executable. (Google for "DLL Hell" for hint.) In that case, your utility might need to be smarter. For instance, it could act as a debugger, load the EXE with a break point at main, run to that break point, then change the PC to point at or into foo and continue from there.
Some kind of similar trickery might be possible on Linux since .so files are also similar in some respects to true executables. Certainly, the approach of acting like a debugger could be made to work.
A rule of thumb would be that the loader supplied by the system would always run main. With sufficient authority and competence you could theoretically write a different loader that did something else.
Rename main to be func and func to be main and call func from name.
If you have access to the source, you can do this and it's easy.
If you are using an open source compiler such as GCC or a compiler targeted at embedded systems you can modify the C runtime startup (CRT) to start at any entry point you need. In GCC this code is in crt0.s. Generally this code is partially or wholly in assembler, for most embedded systems compilers example or default start-up code will be provided.
However a simpler approach is to simply 'hide' main() in a static library that you link to your code. If that implementation of main() looks like:
int main(void)
{
func() ;
}
Then it will look to all intents and purposes as if the user entry point is func(). This is how many application frameworks with entry points other than main() work. Note that because it is in a static library, any user definition of main() will override that static library version.
The solution depends on the compiler and linker which you use. Always is that not main is the real entry point of the application. The real entry point makes some initializations and call for example main. If you write programs for Windows using Visual Studio, you can use /ENTRY switch of the linker to overwrite the default entry point mainCRTStartup and call func() instead of main():
#ifdef NDEBUG
void mainCRTStartup()
{
ExitProcess (func());
}
#endif
If is a standard practice if you write the most small application. In the case you will receive restrictions in the usage of C-Runtime functions. You should use Windows API function instead of C-Runtime function. For example instead of printf("This is func \n") you should use OutputString(TEXT("This is func \n")) where OutputString are implemented only with respect of WriteFile or WriteConsole:
static HANDLE g_hStdOutput = INVALID_HANDLE_VALUE;
static BOOL g_bConsoleOutput = TRUE;
BOOL InitializeStdOutput()
{
g_hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE);
if (g_hStdOutput == INVALID_HANDLE_VALUE)
return FALSE;
g_bConsoleOutput = (GetFileType (g_hStdOutput) & ~FILE_TYPE_REMOTE) != FILE_TYPE_DISK;
#ifdef UNICODE
if (!g_bConsoleOutput && GetFileSize (g_hStdOutput, NULL) == 0) {
DWORD n;
WriteFile (g_hStdOutput, "\xFF\xFE", 2, &n, NULL);
}
#endif
return TRUE;
}
void Output (LPCTSTR pszString, UINT uStringLength)
{
DWORD n;
if (g_bConsoleOutput) {
#ifdef UNICODE
WriteConsole (g_hStdOutput, pszString, uStringLength, &n, NULL);
#else
CHAR szOemString[MAX_PATH];
CharToOem (pszString, szOemString);
WriteConsole (g_hStdOutput, szOemString, uStringLength, &n, NULL);
#endif
}
else
#ifdef UNICODE
WriteFile (g_hStdOutput, pszString, uStringLength * sizeof (TCHAR), &n, NULL);
#else
{
//PSTR pszOemString = _alloca ((uStringLength + sizeof(DWORD)));
CHAR szOemString[MAX_PATH];
CharToOem (pszString, szOemString);
WriteFile (g_hStdOutput, szOemString, uStringLength, &n, NULL);
}
#endif
}
void OutputString (LPCTSTR pszString)
{
Output (pszString, lstrlen (pszString));
}
This really depends how you are invoking the binary, and is going to be reasonably platform and environment specific. The most obvious answer is to simply rename the "main" symbol to something else and call "func" "main", but I suspect that's not what you are trying to do.