Are there any pitfalls when passing function pointers between compilation units? - c

I ask because i am using a PIC microcontroller to asynchronously operate hardware and implementing function pointers as a callback mechanism would be of benefit.
An example would be whereby an i2C library accepts read and write 'jobs' and sequentially executes each 'job' as the hardware resource becomes available (and as the user ticks the i2C software state machine). Depending on the implementers use of the i2C library, they may wish to manipulate the data prior to returning it, (bitmasking, setting flags etc) this is where i'm thinking of adding an i2C callback mechanism.
The user would pass a job, which includes a callback function pointing to the calling compilation unit. Is this allowed? and are there any cases that i need to be careful of if it is allowed?

Passing pointers between compilation units is done all the time. For example, free() in the standard library is certainly compiled separately and yet takes a pointer as its argument.
Within many projects, including the Linux kernel, callbacks between compilation units are used often.
The main key is to use common header files for defining shared variables, making function definitions, and such. If you define a function using a long pointer, but call it using a declaration that specifies a char pointer, you're entering Undefined Behavior territory.
Also watch out for compiler flags that may change variables sizes, default packing, and such.

Related

Thread-safe init of read-only global data

Let's imagine that I'm writing a library that has a reasonably large amount of read-only global data that needs to be initialized before the library can be used. For example, perhaps the global data be lookup tables for various parts of the application logic that won't change during the lifetime of the program.
Now I have a few ways to initialize this data:
I may require that the user call some kind of init() function before the library is used.
I may lazily construct the data the first time a function is called on my library.
I may include the data in a initializer statement in the source, such that variables are statically initialized to their final value.
Now if my data is read-only and should be the same for every environment the library runs in, then (3) is fairly appealing. Even in that case it has some downsides: if the data is very large (but easy to generate procedurally) the size of bloat up a lot (e.g., a library with 50K of code but 8MB of lookup tables would end up around 8050K). Similarly, the source itself may be very large, or the build system needs to handle the generation of the source at compile time.
The main reason you might not able to use (3) is that the tables might be fixed (read-only), but require generation at runtime because they embed some information about the environment (e.g., the value of an environment variable, I configuration setting read from a file, information about the machine architecture, whatever). This data can't be embedded in the source since depends on the runtime environment.
So we have methods (1) and (2) at least - but I can't see how to make these thread-safe in a simple way. The rest of the library can be thread-safe simply by not mutating any global state - just like the vast majority of C functions can be written in a thread-safe way w/o any explicit use of threading primitives.
I can't figure out a similar alternative for this global init, however:
(1) Is undesirable because we prefer not to require the user to call this method, and in any case it simply moves the problem up to the calling code: the calling code then needs to organize to call this init() method exactly once across all threads using the library, and before any thread uses the library.
(2) Fails since concurrent calls to the library might do a double init.
In C++ you can just initialize globals with a method call, like int data[] = loadData(). Is there any equivalent in C? Or am I stuck using threading primitives (which vary by platform, e.g., pthread_once, call_once and whatever Windows has) just to get my thread-safe init?
I don't know of any platform-independent way of initializing a library in a thread-safe manner. That's not surprising since there's no platform-independent threading model in C.
So your solution is going to be platform-specific.
#ThingyWotsit mentions in the comments using C++ to initialize your library, and that will be thread-safe. But it may very well lock you into a specific C++ run-time, so it may not be a useful solution for your C shared object/library. You may not be willing or able to add a dependency on C++ and you may especially not be willing or able to be locked into a specific C++ run-time.
For GCC, you can use the __attribute((constructor)) to have your iniitaliziation function called when the shared object is loaded:
constructor
destructor
constructor (priority)
destructor (priority)
The constructor attribute causes the function to be called automatically before execution enters main ().
Similarly, the destructor attribute causes the function to be called
automatically after main () has completed or exit () has been called.
Functions with these attributes are useful for initializing data that
will be used implicitly during the execution of the program.
You may provide an optional integer priority to control the order in
which constructor and destructor functions are run. A constructor with
a smaller priority number runs before a constructor with a larger
priority number; the opposite relationship holds for destructors. So,
if you have a constructor that allocates a resource and a destructor
that deallocates the same resource, both functions typically have the
same priority. The priorities for constructor and destructor functions
are the same as those specified for namespace-scope C++ objects (see
C++ Attributes).
For example:
static __attribute__((constructor)) void my_lib_init_func( void )
{
...
}
Your code will run before main() is called.
If your library is dynamically loaded (explicit call to dlopen(), for exmaple), your init function will be called when your library is loaded, and your library won't be considered loaded until it returns.
Other compilers provide the functionally-identical #pragma init():
#pragma init(my_lib_init_func)
static void my_lib_init_func( void )
{
...
}
See #pragma init and #pragma fini using gcc compiler on linux
For Windows? The Windows C++ run-time is pretty stable and ubiquitous. I'd just use a C++ solution on Windows, especially if you're compiling with MSVC. (But see the comments...)
Option 3 is always preferable when possible. Your reasoning about the cons is wrong. If you have an 8MB constant table in the executable file, it's directly mapped and shared by all instances of the program or users of the shared library on any remotely modern operating system. If you generate it at runtime, each process will have its own copy of the table.
When option 3 is not available you must use pthread_once or equivalent or implement your own version of the same (much less efficiently) using a lock. There is little reason to use weird OS-specific replacements for it; all major platforms either support POSIX threads API natively or have existing libraries which provide it on top of the platform's low-level primitives.

How to use 'flag' variables other than using as global?

I'm using several no of flags for setting the flags in a mini embedded project.I would like to know what is the best method to use these variables which indicate certain states in the project.I'm using c18 compiler and pic18f controller.
flag1..flag 2..flag3..denote state1, state2,state3..and corresponding
actions were performed.
Making global is an option.
But what is most preferred method through which i can handle the 'flags' well in the embedded system?
First of all, whenever you have lots of "flags", tread carefully. In embedded systems, it is easy to get "flag spaghetti", which consists of a lot of complex dependencies. So examine what flags you have: are they related, can they co-exist etc. If so, it is usually better to merge them into an enum. If the flags specify states, then for better program stability, consider writing your whole program as a state machine. And set the flags in a consistent manner, at specific places in the program. Rather than doing so all over the place.
As for how to store them: there is never a reason in a C program to use a global variable, where the definition of a global variable is a variable declared at file scope, which is visible to the whole program.
If you are using a single-threaded/single-process program, then declaring a variable at file scope is fine. But you must declare it as static, so that it is a private file scope variable rather than a global one.
volatile has nothing to do with scope or program design. To prevent incorrect compiler optimizations, you should always declare a variable volatile if it is shared between the main program and the ISR.
(Please note that volatile does not guarantee any atomic access, it does not protect against race conditions between the ISR and the main program.)

When do I need a function to run before or after main()?

GCC supports construtors/destructor functions which support running function before or after main():
The constructor attribute causes the function to be called automatically before execution enters main(). Similarly, the destructor attribute causes the function to be called automatically after main() completes or exit() is called. Functions with these attributes are useful for initializing data that is used implicitly during the execution of the program.
Here is an example from GeeksforGeeks.
When is the proper scenario of using this feature? Especially a function to be called before main(), what is the difference if we just place it in start of main()?
Such constructor and destructor functions are mainly useful when writing libraries.
If you are writing a library which needs to be initialised, then you would have to provide an initialisation function. But how would you ensure that it is run before any other of your library's functions? The use of the library would have to remember to call it, which they could easily forget.
One way to get the initialisation done automatically is to mark the function as a constructor.
See also: How to initialize a shared library on Linux
For the majority of scenarios there will be no difference. Everything that you want to do with global variables, singletons, memory, etc, you could theoretically do in main() and with plain static initializers.
The main scenario where this is marginally applicable is cross platform projects, where you would like to keep most of your common code in main, however on some platforms, mainly embedded ones, you would like to duplicate what the other OSes are doing before main - setting up environment variables, wiring standard file descriptors (stdin/stdout/stderr) to custom descriptors on your system, allocate your own custom memory manager - e.g., allocate your own stack for running main(), and so on.
From mine point of view, module constructor have their meaning when making shared modules.
Shared modules don't have an specific initialization routine (there is DllMain on Windows, but i has it´s limitations).
For example, Asterisk PBX abuses of constructors because is strongly based on modules, it injects a constructor on each module at compilation time.
This constructor gets called on dlload() and tells asterisk core whether the module has been loaded properly or not, allowing it to call specific functions on the module.
Suppose you have a global structure and you want to initialize memory to the structure before starting your program, you can put it inside the constructor, since it calls before main().
Similarly, if you want to free any existing memory before the end of the program you can do so in the destructor.

Is it possible to LD_PRELOAD a function with different parameters?

Say I replace a function by creating a shared object and using LD_PRELOAD to load it first. Is it possible to have parameters to that function different from the one in original library?
For example, if I replace pthread_mutex_lock, such that instead of parameter pthread_mutex_t it takes pthread_my_mutex_t. Is it possible?
Secondly, besides function, is it possible to change structure declarations using LD_PRELOAD? For example, one may add one more field to a structure.
Although you can arrange to provide your modified pthread_mutex_lock() function, the code will have been compiled to call the standard function. This will lead to problems when the replacement is called with the parameters passed to the standard function. This is a polite way of saying:
Expect it to crash and burn
Any pre-loaded function must implement the same interface — same name, same arguments in, same values out — as the function it replaces. The internals can be implemented as differently as you need, but the interface must be the same.
Similarly with structures. The existing code was compiled to expect one size for the structure, with one specific layout. You might get away with adding an extra field at the end, but the non-substituted code will probably not work correctly. It will allocate space for the original size of structure, not the enhanced structure, etc. It will never access the extra element itself. It probably isn't quite impossible, but you must have designed the program to handle dynamically changing structure sizes, which places severe enough constraints on when you can do it that the answer "you can't" is probably apposite (and is certainly much simpler).
IMNSHO, the LD_PRELOAD mechanism is for dire emergencies (and is a temporary band-aid for a given problem). It is not a mechanism you should plan to use on anything remotely resembling a regular basis.
LD_PRELOAD does one thing, and one thing only. It arranges for a particular DSO file to be at the front of the list that ld.so uses to look up symbols. It has nothing to do with how the code uses a function or data item once found.
Anything you can do with LD_PRELOAD, you can simulate by just linking the replacement library with -l at the front of the list. If, on the other hand, you can't accomplish a task with that -l, you can't do it with LD_PRELOAD.
The effects of what you're describing are conceptually the same as the effects of providing a mismatching external function at normal link time: undefined behavior.
If you want to do this, rather than playing with fire, why don't you make your replacement function also take pthread_mutex_t * as its argument type, and then just convert the pointer to pthread_my_mutex_t * in the function body? Normally this conversion will take place only at the source level anyway; no code should be generated for it.

Sending arguments to ftw()

Is there a way to send arguments to ftw() to be used in process each file/directory on the path? It's a bit difficult to have the argument concerned as a global variable due to multithreading issues, i.e having the value as global will be visible to all threads and that would be wrong.
A properly designed C callback interface has a void* argument that you can use to pass arbitrary data from the surrounding code into the callback. [n]ftw does not have such an argument, so you're kinda up a creek.
If your compiler supports thread-local variables (the __thread storage specifier) you can use them instead of globals; this will work but is not really that much tidier than globals.
If your C library has the fts family of functions, use those instead. They are available on most modern Unixes (including Linux, OSX, and recent *BSD) and gnulib has a fallback implementation.

Resources