One can design plugins in C using 2 techniques (AFAIK):
Use dlopen() all the way: The core code expects all functions in library to have a known name and prototype. It dlopen()s the library and gets all the function pointers via dlsym()
Keep one exposed known function which takes in a structure that is filled with implemented functions by plugin. This one function is got via dlsym() and called once in beginning.
Which technique do you think is better and why? Please mention any other ways of doing this if any.
I prefer the second way, since it will be a lot easier:
to load your plugin: it needs just a single call to dlsym, instead of tens
to handle your plugin: you can pass around the structure with the function pointers. Instead that passing tens of functions or building such a structure in the framework in order to pass it around.
Remember that easier means less error-prone.
Related
If a library function like e.g. malloc is reimplemented, then there are two symbols with that name, one in a local object file and one in the system library. Which of the two is used when a function from e.g. stdio is used, which calls malloc (and why)?
The link behaviour is, in a general way:
Include all symbols defined in the object files.
Then resolve the undefined ones using the objects in the libs.
So, if malloc is reimplemented and linked as object file, the version in the object file will override the version in the standard libraries. However, if the new malloc is linked as a library, it depends on the libraries linking order.
Another way, considering the gnu binutils as scope, to override library functions is to wrap the function using the --wrap parameter: https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html
By using the --wrap ld option we can get both functions linked and the wrapper function would be able to call the wrapped one.
The linking order also depends on the command line parameters order. So I'm considering here that libs are listed after objects because, in general, doesn't make sense to put libraries before objects as their objective is to supply the missing symbols demanded by those.
One answer is that you are getting yourself in an awful lot of trouble by trying to replace malloc. Don't go there. Don't even think about it. Especially if you need to ask questions on stackoverflow, don't even think about it.
Another answer is that you are invoking undefined behaviour, and the likely result is that the malloc function will be called that will hurt you most. If you are lucky, while developing. If you are less lucky, once your code is in the hand of customers. Don't do it.
Writing wrappers around malloc with new names is bad enough. Trying to replace malloc is madness.
Say I replace a function by creating a shared object and using LD_PRELOAD to load it first. Is it possible to have parameters to that function different from the one in original library?
For example, if I replace pthread_mutex_lock, such that instead of parameter pthread_mutex_t it takes pthread_my_mutex_t. Is it possible?
Secondly, besides function, is it possible to change structure declarations using LD_PRELOAD? For example, one may add one more field to a structure.
Although you can arrange to provide your modified pthread_mutex_lock() function, the code will have been compiled to call the standard function. This will lead to problems when the replacement is called with the parameters passed to the standard function. This is a polite way of saying:
Expect it to crash and burn
Any pre-loaded function must implement the same interface — same name, same arguments in, same values out — as the function it replaces. The internals can be implemented as differently as you need, but the interface must be the same.
Similarly with structures. The existing code was compiled to expect one size for the structure, with one specific layout. You might get away with adding an extra field at the end, but the non-substituted code will probably not work correctly. It will allocate space for the original size of structure, not the enhanced structure, etc. It will never access the extra element itself. It probably isn't quite impossible, but you must have designed the program to handle dynamically changing structure sizes, which places severe enough constraints on when you can do it that the answer "you can't" is probably apposite (and is certainly much simpler).
IMNSHO, the LD_PRELOAD mechanism is for dire emergencies (and is a temporary band-aid for a given problem). It is not a mechanism you should plan to use on anything remotely resembling a regular basis.
LD_PRELOAD does one thing, and one thing only. It arranges for a particular DSO file to be at the front of the list that ld.so uses to look up symbols. It has nothing to do with how the code uses a function or data item once found.
Anything you can do with LD_PRELOAD, you can simulate by just linking the replacement library with -l at the front of the list. If, on the other hand, you can't accomplish a task with that -l, you can't do it with LD_PRELOAD.
The effects of what you're describing are conceptually the same as the effects of providing a mismatching external function at normal link time: undefined behavior.
If you want to do this, rather than playing with fire, why don't you make your replacement function also take pthread_mutex_t * as its argument type, and then just convert the pointer to pthread_my_mutex_t * in the function body? Normally this conversion will take place only at the source level anyway; no code should be generated for it.
I have written a custom library which implements malloc/calloc/realloc/free using the standard C prototypes, and I figured out how to compile it to an so. I want to test the library by linking a standard application against it? What would be a good way to do this? Once I have a working library I assume I can just load it with LD_PRELOAD, but how do I get my functions to co-exist with but take precedence over the system library ones? My functions need to make a call to malloc in order to get memory to run, so I can't just completely ditch stdlib... Help?
Functions that you are trying to replace are standard C functions, not macros, not system calls. So you have to simply give your functions the same names and compile them into a shared library.
Then, use LD_PRELOAD to pre-load your library before binary starts. Since all addresses are resolved once, linker will figure out addresses of your functions and remember their names and will not look for them in standard library later.
This approach might not work if your program is linked with the standard runtime statically. Also, it will not work on Mac OS X as there is another API for interpolation.
In Linux, for example, in order for your functions to co-exist (i.e. if you want to use system malloc in your own implementation of malloc), you have to open the standard library manually using dlopen, look up functions you need there using dlsym and call them later by address.
Don't write your malloc() in terms of malloc() -- write it using sbrk, which gets memory directly from the OS.
If you have control of the source code that is to use this library, here is one possibility. Use different function names: Rather than malloc, for example, call it newCoolMalloc. This method is sometimes simpler and doesn't depend on special linker options.
Then in your code, use #define to cause the code to call the desired set of functions. You can #define malloc to be something different. For example:
#define malloc newCoolMalloc
#define free newCoolFree
If you do that, though, you have to be very very careful to include that consistently. Otherwise you run the risk of using stdlib malloc in one place and then your own free in another leading to messy bugs. One way to help mitigate that situation is to (if possible) in your own code use custom names for the allocation and free functions. Then it is easier to ensure that the correct one is being called. You can define the various custom names to your own malloc functions or even the original stdlib malloc functions.
For example, you might use mallocPlaceHolder as the actual name in the code:
someThing = mallocPlaceHolder( nbytes );
Then your defines would look more like this:
#define mallocPlaceHolder myCoolMalloc
If no function of the form mallocPlaceHolder (and associated free) actually exist, it avoids mixing different libraries.
I have in some documentation for a plugin for Dreamweaver I am making that says the following:
void **connectionData
• The connectionData argument is a
handle to the data that the agent
wants Dreamweaver to pass to it when
calling other API functions.
I have no other information than this from the manual in regard to connectionData. Thinking literally, I figured handle refered to a generic handle,however I am not able to find documentation on working with generic handles in regard to C.
HANDLE h = connectionData;
Does compile in my code. How exactly do I get the "secrets" inside this data structure/can someone explain how generic handles for C work?
Well, usually you are not supposed to get the secrets of handles; they are usually just a pointer to some internal structure inside the lib/API you are using and only the lib will know how to use it.
There is no generic rules or anything about handles, you'll have to use them as indicated by your lib's docs.
The way that this is defined, connectionData is a pointer to a pointer to something. Without knowing what is assigned to connectionData, you can't know anything else. The reason why your other statement worked is that HANDLE is probably a macro that expands to void*
To know the "Secrets," you would need to find out what struct (this is a guess - it could actually be any data type) connectionData points to, then look at the definition of that struct. I don't know how familiar you are with programming in general but a debugger allows you to easily look at the struct's fields while paused at a breakpoint.
However, as other people have said, you probably don't want to muck with the internals of whatever this points to, and only use API calls.
C developers use a "handle" data type when they specifically want to hide the internal data and keep API users from monkeying around with the implementation. A handle is sometimes just a pointer, but can also be an index into an internal lookup table.
In general, you should only use provided API functions with a handle, and learn the proper way to get the handle, understand its life cycle and how to properly dispose of it when you're done.
What methods, practices and conventions do you know of to modularize C code as a project grows in size?
Create header files which contain ONLY what is necessary to use a module. In the corresponding .c file(s), make anything not meant to be visible outside (e.g. helper functions) static. Use prefixes on the names of everything externally visible to help avoid namespace collisions. (If a module spans multiple files, things become harder., as you may need to expose internal things and not be able hide them with "static")
(If I were to try to improve C, one thing I would do is make "static" the default scoping of functions. If you wanted something visible outside, you'd have to mark it with "export" or "global" or something similar.)
OO techniques can be applied to C code, they just require more discipline.
Use opaque handles to operate on objects. One good example of how this is done is the stdio library -- everything is organised around the opaque FILE* handle. Many successful libraries are organised around this principle (e.g. zlib, apr)
Because all members of structs are implicitly public in C, you need a convention + programmer discipline to enforce the useful technique of information hiding. Pick a simple, automatically checkable convention such as "private members end with '_'".
Interfaces can be implemented using arrays of pointers to functions. Certainly this requires more work than in languages like C++ that provide in-language support, but it can nevertheless be done in C.
The High and Low-Level C article contains a lot of good tips. Especially, take a look at the "Classes and objects" section.
Standards and Style for Coding in ANSI C also contains good advice of which you can pick and choose.
Don't define variables in header files; instead, define the variable in the source file and add an extern statement (declaration) in the header. This will tie into #2 and #3.
Use an include guard on every header. This will save so many headaches.
Assuming you've done #1 and #2, include everything you need (but only what you need) for a certain file in that file. Don't depend on the order of how the compiler expands your include directives.
The approach that Pidgin (formerly Gaim) uses is they created a Plugin struct. Each plugin populates a struct with callbacks for initialization and teardown, along with a bunch of other descriptive information. Pretty much everything except the struct is declared as static, so only the Plugin struct is exposed for linking.
Then, to handle loose coupling of the plugin communicating with the rest of the app (since it'd be nice if it did something between setup and teardown), they have a signaling system. Plugins can register callbacks to be called when specific signals (not standard C signals, but a custom extensible kind [identified by string, rather than set codes]) are issued by any part of the app (including another plugin). They can also issue signals themselves.
This seems to work well in practice - different plugins can build upon each other, but the coupling is fairly loose - no direct invocation of functions, everything's through the signaling stystem.
A function should do one thing and do this one thing well.
Lots of little function used by bigger wrapper functions help to structure code from small, easy to understand (and test!) building blocks.
Create small modules with a couple of functions each. Only expose what you must, keep anything else static inside of the module. Link small modules together with their .h interface files.
Provide Getter and Setter functions for access to static file scope variables in your module. That way, the variables are only actually written to in one place. This helps also tracing access to these static variables using a breakpoint in the function and the call stack.
One important rule when designing modular code is: Don't try to optimize unless you have to. Lots of small functions usually yield cleaner, well structured code and the additional function call overhead might be worth it.
I always try to keep variables at their narrowest scope, also within functions. For example, indices of for loops usually can be kept at block scope and don't need to be exposed at the entire function level. C is not as flexible as C++ with the "define it where you use it" but it's workable.
Breaking the code up into libraries of related functions is one way of keeping things organized. To avoid name conflicts you can also use prefixes to allow you to reuse function names, though with good names I've never really found this to be much of a problem. For example, if you wanted to develop your own math routines but still use some from the standard math library, you could prefix yours with some string: xyz_sin(), xyz_cos().
Generally I prefer the one function (or set of closely related functions) per file and one header file per source file convention. Breaking files into directories, where each directory represents a separate library is also a good idea. You'd generally have a system of makefiles or build files that would allow you to build all or part of the entire system following the hierarchy representing the various libraries/programs.
There are directories and files, but no namespaces or encapsulation. You can compile each module to a separate obj file, and link them together (as libraries).