Best way to implement plugin framework - are DLLs the only way (C/C++ project)? - c

Introduction:
I am currently developing a document classifier software in C/C++ and I will be using Naive-Bayesian model for classification. But I wanted the users to use any algorithm that they want(or I want in the future), hence I went to separate the algorithm part in the architecture as a plugin that will be attached to the main app # app start-up. Hence any user can write his own algorithm as a plugin and use it with my app.
Problem Statement:
The way I am intending to develop this is to have each of the algorithms that user wants to use to be made into a DLL file and put into a specific directory. And at the start, my app will search for all the DLLs in that directory and load them.
My Questions:
(1) What if a malicious code is made as a DLL (and that will have same functions mandated by plugin framework) and put into my plugins directory? In that case, my app will think that its a plugin and picks it and calls its functions, so the malicious code can easily bring down my entire app down (In the worst case could make my app as a malicious code launcher!!!).
(2) Is using DLLs the only way available to implement plugin design pattern? (Not only for the fear of malicious plugin, but its a generic question out of curiosity :) )
(3) I think a lot of softwares are written with plugin model for extendability, if so, how do they defend against such attacks?
(4) In general what do you think about my decision to use plugin model for extendability (do you think I should look at any other alternatives?)
Thank you
-MicroKernel :)

Do not worry about malicious plugins. If somebody managed to sneak a malicious DLL into that folder, they probably also have the power to execute stuff directly.
As an alternative to DLLs, you could hook up a scripting language like Python or Lua, and allow scripted plugins. But maybe in this case you need the speed of compiled code?
For embedding Python, see here. The process is not very difficult. You can link statically to the interpreter, so users won't need to install Python on their system. However, any non-builtin modules will need to be shipped with your application.
However, if the language does not matter much to you, embedding Lua is probably easier because it was specifically designed for that task. See this section of its manual.
See 1. They don't.
Using a plugin model sounds like a fine solution, provided that a lack of extensibility really is a problem at this point. It might be easier to hard-code your current model, and add the plugin interface later, if it turns out that there is actually a demand for it. It is easy to add, but hard to remove once people started using it.

Malicious code is not the only problem with DLLs. Even a well-meaning DLL might contain a bug that could crash your whole application or gradually leak memory.
Loading a module in a high-level language somewhat reduces the risk. If you want to learn about embedding Python for example, the documentation is here.
Another approach would be to launch the plugin in a separate process. It does require a bit more effort on your part to implement, but it's much safer. The seperate process approach is used by Google's Chrome web browser, and they have a document describing the architecture.
The basic idea is to provide a library for plugin writers that includes all the logic for communicating with the main app. That way, the plugin author has an API that they use, just as if they were writing a DLL. Wikipedia has a good list of ways for inter-process communication (IPC).

1) If there is a malicious dll in your plugin folder, you are probably already compromised.
2) No, you can load assembly code dynamically from a file, but this would just be reinventing the wheel, just use a DLL.
3) Firefox extensions don't, not even with its javascript plugins. Everything else I know uses native code from dynamic libraries, and is therefore impossible to guarantee safety. Then again Chrome has NaCL which does extensive analysis on the binary code and rejects it if it can't be 100% sure it doesn't violate bounds and what not, although I'm sure they will have more and more vulnerabilities as time passes.
4) Plugins are fine, just restrict them to trusted people. Alternatively, you could use a safe language like LUA, Python, Java, etc, and load a file into that language but restrict it only to a subset of API that wont harm your program or environment.

(1) Can you use OS security facilities to prevent unauthorized access to the folder where the DLL's are searched or loaded from? That should be your first approach.
Otherwise: run a threat analysis - what's the risk, what are known attack vectors, etc.
(2) Not necessarily. It is the most straigtforward if you want compiled plugins - which is mostly a question of performance, access to OS funcitons, etc. As mentioned already, consider scripting languages.
(3) Usually by writing "to prevent malicous code execution, restrict access to the plugin folder".
(4) There's quite some additional cost - even when using a plugin framework you are not yet familiar with. it increases cost of:
the core application (plugin functionality)
the plugins (much higher isolation)
installation
debugging + diagnostics (bugs that occur only with a certain combinaiton of plugins)
administration (users must know of, and manage plugins)
That pays only if
installing/updating the main software is much more complex than updating the plugins
individual components need to be updated individually (e.g. a user may combine different versions of plugins)
other people develop plugins for your main application
(There are other benefits of moving code into DLL's, but they don't pertain to plugins as such)

What if a malicious code is made as a DLL
Generally, if you do not trust dll, you can't load it one way or another.
This would be correct for almost any other language even if it is interpreted.
Java and some languages do very hard job to limit what user can do and this works only because they run in virtual machine.
So no. Dll loaded plug-ins can come from trusted source only.
Is using DLLs the only way available to implement plugin design pattern?
You may also embed some interpreter in your code, for example GIMP allows writing plugins
in python.
But be aware of fact that this would be much slower because if nature of any interpreted language.

We have a product very similar in that it uses modules to extend functionality.
We do two things:
We use BPL files which are DLLs under the covers. This is a specific technology from Borland/Codegear/Embarcadero within C++ Builder. We take advantage of some RTTI type features to publish a simple API similar to the main (argv[]) so any number of paramters can be pushed onto the stack and popped off by the DLL.
We also embed PERL into our application for things that are more business logic in nature.
Our software is an accounting/ERP suite.

Have a look at existing plugin architectures and see if there is anything that you can reuse. http://git.dronelabs.com/ethos/about/ is one link I came across while googling glib + plugin. glib itself might may it easier to develop a plugin architecture. Gstreamer uses glib and has a very nice plugin architecture that may give you some ideas.

Related

Failsafe way to load Shared Object

I am curently using the GLib g_module functions to load some shared objects during runtime.
The basic way I use is the following:
Call g_module_open to get the module
After that, call g_module_make_resident
Load Symbols by using g_module_symbol
As I am using this as a basic way to add plugin compatability, I am interested if there is a good way to make sure that even if the loaded module has a bug (like memory corruption (malloc/free)) the main application can just 'catch' this error without crashing everything?
I realy do not want you to write any code, I am just interested if there is a good way to achive this...
As Severin mentioned, there isn't really anything you can do easily. That said, you do have a few options:
The first thing you might want to consider is using something like libpeas, which allows you to load plugins in languages with non-C linkage (JavaScript, Python, etc.). Many of these languages provide much more safety than C, so if you're trying to protect against programmer error (as opposed to malicious modules) this could be a good way to go.
The other relatively straightforward way to accomplish this would be to run each plugin in a separate process. You can communicate over D-Bus, pipes, etc. One advantage of this approach is that some modules can have less permissions; if you have a program which interacts with hardware that may need root permissions, but your UI could still run as an unprivileged user. Telepathy is an example of this sort of architecture.

What is the common code reuse strategy in C

Context: C language, 8 bit microprocessor
We have identified components which can be reused between projects (products). But I can not find which is the best infrastructure to handle the reusable components.
Two possibilities I found up to now:
Static libraries
Shared files in subversion
Both shared libraries and shared source let you share the common code among projects. Libraries present a better of the two alternatives, so you should use them if they are available on your platform. This lets you guard the source of the library from inadvertent modifications, which could happen if the code from source control is changed locally.
The only problem with sharing code through libraries may be lack of support for source-level debugging of library code by some of the tools in your embedded tool chain (e.g. debuggers attached to in-circuit emulators). In this case reusing code through the source may be acceptable. If possible, you should guard the source from modification through the file system access controls.
If you have reusable components, libraries are the way to go.
It's easier to maintain and you have a clear interface. It's also easier to incorporate into new projects.
You can easily do individual unit tests on library code
Lesser risk to copy and paste code.
Programmers are more aware that this code is shared when they have to use it from a library.
Several good arguments have been made for the library approach.
However, there's at least one good argument for re-building (perhaps from the same source repository) each time you build a dependent project, and that would be the ability to apply target- project- or development stage- unique compile settings to all of the code, including the shared portion.
At my company, we used both approaches at the same time:
We do two checkouts: one for the project, the other for the library.
When the project needs to be compiled (via Makefile), we compile the library first.
The library is then linked as if it was a binary-only library.
When we release a project, we check whether the other projects still compile against the new library.
When we release a project, we tag the library along with the project.
This way you get the best of both worlds:
common code is shared: all projects benefit from bug fixes and improvements
source code is always fully available for understanding and debugging
source code availability encourages library maintenance (fixings, improvements, and experiments)
the library boundaries impose a more API-like approach: clearer interface and project embedding
you can pass compile-time flags to the library to build a different flavors
you can always go back in time if needed without library-vs-project mismatching hassles
if you are in a hurry, you can put off the library check.
The only drawback to this approach is that developers have not know what they are doing. If they modify the library, they should know that the change will impact on all projects. But you are already using a version control system and, if you use branches and the communication within your team is good, there should be no problem at all.

Hooking in C and windows

I'm looking for a quick guide to basic dll hooking in windows with C, but all the guides I can find are either not C, or not windows.
(The DLL is not part of windows, but a third party program)
I understand the principle, but I don't know how to go about it.
I have pre-existing source code in C++ that shows what I need to hook into, but I don't have any libraries for C, or know how to hook from scratch.
The detours license terms are quite restrictive.
If you merely want to hook certain functions of a DLL it is often cheaper to use a DLL-placement attack on the application whose DLL you want to hook. In order to do this, provide a DLL with the same set of exports and forward those that you don't care about and intercept the rest. Whether that's C or C++ doesn't really matter. This is often technically feasible even with a large number of exports but has its limitations with exported data and if you don't know or can't discern the calling convention used.
If you must use hooking there are numerous ways including to write a launcher and rewrite the prepopulated (by the loader) IAT to point to your code while the main thread of the launched application is still suspended (see the respective CreateProcess flag). Otherwise you are likely going to need at least a little assembly knowledge to get the jumps correct. There are plenty of liberally licensed disassembler engines out there that will allow you to calculate the proper offsets for patching (because you don't want to patch the middle of a multi-byte opcode, for example).
You may want to edit your question again to include what you wrote in the comments (keyword: "DLL hooking").
loading DLLs by LoadLibrary()
This is well known bad practice.
You might want to look up "witch" or "hctiw", the infamous malware dev. there's a reason he's so infamous - he loaded DLLs with LoadLibrary(). try to refrain from bad practice like that.

Is Extendible program in C possible?

I am looking into making a C program which is divided into a Core and Extensions. These extensions should allow the program to be extended by adding new functions. so far I have found c-pluff a plugin framework which claims to do the same. if anybody has any other ideas or reference I can check out please let me know.
You're not mentioning a platform, and this is outside the support of the language itself.
For POSIX/Unix/Linux, look into dlopen() and friends.
In Windows, use LoadLibrary().
Basically, these will allow you to load code from a platform-specific file (.so and .dll, respectively), look up addresses to named symbols/functions in the loaded file, and access/run them.
I tried to limit myself to the low-level stuff, but if you want to have a wrapper for both of the above, look at glib's module API.
The traditional way on windows is with DLLs. But this kind of obselete. If you want users to actually extend your program (as opposed to your developer team releasing official plugins) you will want to embed a scripting language like Python or Lua, because they are easier to code in.
You can extend your core C/C++ program using some script language, for example - Lua
There are several C/C++ - Lua integration tools (toLua, toLua++, etc.)
Do you need to be able to add these extensions to the running program, or at least after the executable file is created? If you can re-link (or even re-compile) the program after having added an extension, perhaps simple callbacks would be enough?
If you're using Windows you could try using COM. It requires a lot of attention to detail, and is kind of painful to use from C, but it would allow you to build extension points with well-defined interfaces and an object-oriented structure.
In this usage case, extensions label themselves with a 'Component Category' defined by your app, hwich allows the Core to find and load them withough havng to know where their DLLs are. The extensions also implement interfaces that are specified using IDL and are consumed by the core.
This is old tech now, but it does work.

Why do you obfuscate your code?

Have you ever obfuscated your code before? Are there ever legitimate reasons to do so?
I have obfuscated my JavaScript. It made it smaller, thus reducing download times. In addition, since the code is handed to the client, my company didn't want them to be able to read it.
Yes, to make it harder to reverse engineer.
To ensure a job for life, of course (kidding).
This is pretty hilarious and educational: How to Write Unmaintanable Code.
It's called "Job Security". This is also the reason to use Perl -- no need to do obfuscation as separate task, hence higher productivity, without loss of job security.
Call it "security through obsfuscability" if you will.
I don't believe making reverse engineering harder is a valid reason.
A good reason to obfuscate your code is to reduce the compiled footprint. For instance, J2ME appliactions need to be as small as possible. If you run you app through an obfuscator (and optimiser) then you can reduce the jar from a couple of Mb to a few hundred Kb.
The other point, nestled above, is that most obfuscators are also optimisers which can improve your application's performance.
Isn't this also used as security through obscurity? When your source code is publically available (javascript etc) you might want to at least it somewhat harder to understand what is actually occuring on the client side.
Security is always full of compromises. but i think that security by obscurity is one of the least effective methods.
I believe all TV cable boxes will have the java code obfuscated. This does make things harder to hack, and since the cable boxes will be in your home, they are theoretically hackable.
I'm not sure how much it will matter since the cable card will still control signal encryption and gets its authorization straight from the video source rather than the java code guide or java apps, but they are pretty dedicated to the concept.
By the way, it is not easy to trace exceptions thrown from an obfuscated stack! I actually memorized at one point that aH meant "Null Pointer Exception" for a particular build.
I remember creating a Windows Service for Online Backup application that was built in .NET. I could easily use either Visual Studio or tools like .NET Reflector to see the classes and the source code inside it.
I created a new Visual Studio Test application and added the Windows Service reference to it. Double clicked on the reference and I can see all the classes, namespaces everything (not the source code though). Anybody can figure out the internal working of your modules by looking at the class names. In my case, one such class was FTPHandler that clearly tells where the backups are going.
.NET Reflector goes beyond that by showing the actual code. It even has an option to Export the whole project so you get a VS project with all the classes and source code similar to what the developer had.
I think it makes sense to obfuscate, to make it atleast harder if not impossible for someone to disassemble. Also I think it makes sense for products involving large customer base where you do not want your competitors to know much about your products.
Looking at some of the code I wrote for my disk driver project makes me question what it means to be obfuscated.
((int8_t (*)( int32_t, void * )) hdd->_ctrl)( DISK_CMD_REQUEST, (void *) dr );
Or is that just system programming in C? Or should that line be written differently? Questions...
Yes and no, I haven't delivered apps with a tool that was easy decompilable.
I did run something like obfuscators for old Basic and UCSD Pascal interpreters, but that was for a different reason, optimizing run time.
If I am delivering Java Swing apps to clients, I always obfuscate the class files before distribution.
You can never be too careful - I once pointed a decent Java decompiler (I used the JD Java Decompiler - http://www.djjavadecompiler.com/ ) at my class files and was rewarded with an almost perfect reproduction of the original code. That was rather unnerving, so I started obfuscating my production code ever since. I use Klassmaster myself (http://www.zelix.com/klassmaster/)
I obfuscated code of my Android applications mostly. I used ProGuard tool to obfuscate the code.
When I worked on the C# project, our team used the ArmDot. It's licensing and obfuscation system.
Modern obfuscators are used not only to make hacking process difficult. They are able to protect programs and games from cheating, check licenses/keys and even optimize code.
But I don't think it is necessary to use obfuscator in every project.
It's most commonly done when you need to provide something in source (usually due to the environment it's being built in, such as systems without shared libraries, especially if you as the seller don't have the exact system being build for), but you don't want the person you're giving it to to be able to modify or extend it significantly (or at all).
This used to be far more common than today. It also led to the (defunct?) Obfuscated C Contest.
A legal (though arguably not "legitimate") use might be to release "source" for an app you're linking with GPL code in obfuscated fashion. It's source, it can be modified, it's just very hard. That would be a more extreme version of releasing it without comments, or releasing with all whitespace trimmed, or (and this would be pushing the legal grounds probably) releasing assembler source generated from C (and perhaps hand-tweaked so you can say it's not just intermediate code).

Resources