Pharo 4 FFI current state and the future - ffi

I would like to know which FFI interfaces are supported and functional in Pharo 4, which ones are recommended (if any), and if there are some big changes planned in the future Pharo versions.
Especially I would like to know which stable FFI callback mechanism is available, and what are its restrictions and limitations.

Sorry for taking so long.
Currently, in Pharo you have two possibilities (and three packages –projects– to tackle them).
Out of the box in Pharo 4 you have NativeBoost-FFI who uses ASMJIT as a backend to generate native calls. That means super fast calls, but not so fast callbacks (because it uses some hard trick to make the VM calls back properly).
You can find examples of it all around the source-code, you can take a look specially at NBBasicExamples class.
You also have FFI plugin which implements a more traditional approach. You have to packages to handle this:
FFI, who implements callouts using pragmas (no callbacks). You can install it from the configurations browser and it comes with a set of examples.
AlienFFI, another package to handle same library FFI handles, but implementen a more "alien" approach (each function is an object, not a method). This implements callbacks properly and with good performance.
Installation is a bit more tricky because it has not been (yet) tested in Pharo 4 (so it is not in the configurations browser) but it should load fine.
You can find it here: http://catalog.pharo.org, along with instructions on how to install it (name is OldAlien, because of historical reasons).
All of them are very stable, but we will do some changes in the close future:
We will add a backend for NativeBoost-FFI to use the FFI plugin. The reason is our difficulty to maintain current version and the fact we can find more maintainers willing to work in C than in ASM :)
This change should be backward compatible so you are safe-to-go with NativeBoost.
Hope this info works for you.

Related

pyOCCT vs PythonOCC for new project (2020)

I am starting a new project with some 3D CAD objects to be generated from a specific domain data. I can code it with c++ using OpenCascade but I prefer to use Python if possible. There are two popular occ python bindings: pyOCCT and PythonOCC, both projects are active and up to date with occ 7.4 but it will be great to have an advice from someone who knows both. As pyOCCT is a newer project, I suppose it solves something that PythonOCC does not, but it is not clear what was the motivation to create a new binding. I will need some web rendering support, apparently PythonOCC already support web rendering.
A little background: I was working on a project and using pythonocc. It's a great project, but at the time was stuck on OCE (OpenCASCADE Community Edition) v6 and OpenCASCADE (official) had since released v7+ with a lot of performance improvements. I attempted to update pythonocc wrappers to v7+ but with 7+ they made much more use of c++ templates and I couldn't get a handle on how to do this in SWIG. When I tried doing this with pybind11 it seemed like a more natural fit (and I was able to get it working). So, I started pyOCCT to wrap OCCT 7+ using pybind11.
Since then, pythonocc has updated its wrapper process and now targets OCCT, so you can't go wrong with either if you are just looking for access to OCCT in Python. Pythonocc has a larger user base so you will likely find more people to collaborate with.
I've tried to keep up with SMESH and recently started up a pySMESH project for CAE applications if that is relevant to your work. That is only compatible with pyOCCT. Though again, with some extra effort you could probably write SWIG wrappers for SMESH instead of pybind11 ones if you really wanted to, making it compatible with pythonocc.
pyOCCT is based on pybind11 template library which is simple and powerful. PythonOCC is based on SWIG which is rather complicated. I have tried both, pyOCCT looks more attractive and perspective, especially if you need to add your own wrappers for some purpose. However, I do not use web rendering at all.
Neither. Use CadQuery's OCP instead, because OCP is the only OCCT Python wrapper to internally use a sane clang-based binding generator. OCP is thus roughly analogous to PySide2, which also internally uses a sane clang-based binding generator to generate its Python bindings.
Meanwhile, pyOCCT uses the hand-rolled pybind11 binding generator that requires C[++] headers to be modified with pybind11-specific macros. Since pybind11 does not leverage clang, neither does pyOCCT. Bizarrely, there's actually a downstream binding generator ambiguously named "Binder" that does leverage both pybind11 and clang. Of course, pyOCCT doesn't use Binder.
All else being equal, what you're usually looking for when choosing between higher-level language bindings to lower-level language frameworks is whether those bindings use clang or not. Prefer clang-based bindings to ad-hoc bindings that try (and usually fail) to parse C[++] with hand-rolled lexers and parsers. C[++] is a dark pit of lexical, syntactic, and semantic edge cases you never want to parse with non-standard toolchains.
Trust clang. Distrust everything not clang. This is the way.

"Soft" internal vs public API boundary with warnings rather than link errors

I'm looking for a way to introduce a formal public API into a program (PostgreSQL) that presently lacks any formal boundary between extension-accessible interfaces and those for internal use only.
It has a few headers that say they're for internal use only, and it uses a lot of statics, but there's still a great deal that extensions can just reach into ... but shouldn't. This makes it impractical to offer binary compatibility guarantees across even patch releases. The current approach boils down to "it should work but we don't promise anything," which is not ideal.
It's a large piece of software and it's not going to be practical to introduce a hard boundary all at once since there's such a lively extension community outside the core codebase. But we'd need a way to tell people "this interface is private, don't use it or speak up and ask for it to be declared public if you need it and can justify it".
Portably.
All I've been able to come up with so far is to add macros around gcc's __attribute__ ((deprecated)) and MSVC's __declspec(deprecated) that are empty when building the core server, but defined normally when building extensions. That'll work, but "deprecated" isn't quite right, it's more a case of "use of non-public internal API".
Is there any better way than using deprecated annotations? I'd quite like to actually be able to use them for deprecating use of functionality in-core too, as the server grows and it becomes impractical to always change everything all in one sweep.

What is the common code reuse strategy in C

Context: C language, 8 bit microprocessor
We have identified components which can be reused between projects (products). But I can not find which is the best infrastructure to handle the reusable components.
Two possibilities I found up to now:
Static libraries
Shared files in subversion
Both shared libraries and shared source let you share the common code among projects. Libraries present a better of the two alternatives, so you should use them if they are available on your platform. This lets you guard the source of the library from inadvertent modifications, which could happen if the code from source control is changed locally.
The only problem with sharing code through libraries may be lack of support for source-level debugging of library code by some of the tools in your embedded tool chain (e.g. debuggers attached to in-circuit emulators). In this case reusing code through the source may be acceptable. If possible, you should guard the source from modification through the file system access controls.
If you have reusable components, libraries are the way to go.
It's easier to maintain and you have a clear interface. It's also easier to incorporate into new projects.
You can easily do individual unit tests on library code
Lesser risk to copy and paste code.
Programmers are more aware that this code is shared when they have to use it from a library.
Several good arguments have been made for the library approach.
However, there's at least one good argument for re-building (perhaps from the same source repository) each time you build a dependent project, and that would be the ability to apply target- project- or development stage- unique compile settings to all of the code, including the shared portion.
At my company, we used both approaches at the same time:
We do two checkouts: one for the project, the other for the library.
When the project needs to be compiled (via Makefile), we compile the library first.
The library is then linked as if it was a binary-only library.
When we release a project, we check whether the other projects still compile against the new library.
When we release a project, we tag the library along with the project.
This way you get the best of both worlds:
common code is shared: all projects benefit from bug fixes and improvements
source code is always fully available for understanding and debugging
source code availability encourages library maintenance (fixings, improvements, and experiments)
the library boundaries impose a more API-like approach: clearer interface and project embedding
you can pass compile-time flags to the library to build a different flavors
you can always go back in time if needed without library-vs-project mismatching hassles
if you are in a hurry, you can put off the library check.
The only drawback to this approach is that developers have not know what they are doing. If they modify the library, they should know that the change will impact on all projects. But you are already using a version control system and, if you use branches and the communication within your team is good, there should be no problem at all.

Dynamic Interface for functions in C

Assume you have a function read_key and normally it does some stuff. You someone should be able to replace it with his function read_key_myfunction and it does other stuff.
The general approach would of course be to build an array and register function pointers or using simple switch statements (or both).
But my target is a bit broader: People should be able to write their C-stuff and NOT interfere with my code and it should still register. Of course, I tell them which interface to implement.
What they now basically do is program a library for my software which I dynamically load based on a configuration option. Think of it like OpenSSLs engines: Anyone can write their own engine, compile it as a dll/so and distribute it. They don't need to modify (or know) OpenSSLs code, as long as they stick to the defined interface.
I just want the same (it will in the end be a wrapper for OpenSSL engine functions) for my program.
A colleague suggested I should use the same function in every file and load the libraries dynamically. This sounds like a good solution to me, but I am not quite satisfied since I don't see OpenSSL using any non-engine-specific function in their engine-code.
If some things are unclear here is my specific example:
I am extending a program called sscep which implements a protocol for automatic certificate renewal. A lot of cryptography should take place in HSMs in the future (and right now it should take place within the Windows Key Management (which is accessed by the capi-engine from OpenSSL)).
While OpenSSL already serves a generic interface, there is some stuff I need to do beforehand and it depends on the engine used. I also want to open the possibility for everyone else to extend it quickly without having to dig into my code (like I had from the person before me).
If anyone has any idea, it would be greatly appreciated to see some kind of guideline. Thanks in advance.
What you are describing is commonly called a plugin architecture/plugin framework. You need to combine cross-platform dlopen/LoadLibrary functionality with some logic for registering and performing lookup of exported functions. You should be able to find examples on how to do this on the internet.

Best way to implement plugin framework - are DLLs the only way (C/C++ project)?

Introduction:
I am currently developing a document classifier software in C/C++ and I will be using Naive-Bayesian model for classification. But I wanted the users to use any algorithm that they want(or I want in the future), hence I went to separate the algorithm part in the architecture as a plugin that will be attached to the main app # app start-up. Hence any user can write his own algorithm as a plugin and use it with my app.
Problem Statement:
The way I am intending to develop this is to have each of the algorithms that user wants to use to be made into a DLL file and put into a specific directory. And at the start, my app will search for all the DLLs in that directory and load them.
My Questions:
(1) What if a malicious code is made as a DLL (and that will have same functions mandated by plugin framework) and put into my plugins directory? In that case, my app will think that its a plugin and picks it and calls its functions, so the malicious code can easily bring down my entire app down (In the worst case could make my app as a malicious code launcher!!!).
(2) Is using DLLs the only way available to implement plugin design pattern? (Not only for the fear of malicious plugin, but its a generic question out of curiosity :) )
(3) I think a lot of softwares are written with plugin model for extendability, if so, how do they defend against such attacks?
(4) In general what do you think about my decision to use plugin model for extendability (do you think I should look at any other alternatives?)
Thank you
-MicroKernel :)
Do not worry about malicious plugins. If somebody managed to sneak a malicious DLL into that folder, they probably also have the power to execute stuff directly.
As an alternative to DLLs, you could hook up a scripting language like Python or Lua, and allow scripted plugins. But maybe in this case you need the speed of compiled code?
For embedding Python, see here. The process is not very difficult. You can link statically to the interpreter, so users won't need to install Python on their system. However, any non-builtin modules will need to be shipped with your application.
However, if the language does not matter much to you, embedding Lua is probably easier because it was specifically designed for that task. See this section of its manual.
See 1. They don't.
Using a plugin model sounds like a fine solution, provided that a lack of extensibility really is a problem at this point. It might be easier to hard-code your current model, and add the plugin interface later, if it turns out that there is actually a demand for it. It is easy to add, but hard to remove once people started using it.
Malicious code is not the only problem with DLLs. Even a well-meaning DLL might contain a bug that could crash your whole application or gradually leak memory.
Loading a module in a high-level language somewhat reduces the risk. If you want to learn about embedding Python for example, the documentation is here.
Another approach would be to launch the plugin in a separate process. It does require a bit more effort on your part to implement, but it's much safer. The seperate process approach is used by Google's Chrome web browser, and they have a document describing the architecture.
The basic idea is to provide a library for plugin writers that includes all the logic for communicating with the main app. That way, the plugin author has an API that they use, just as if they were writing a DLL. Wikipedia has a good list of ways for inter-process communication (IPC).
1) If there is a malicious dll in your plugin folder, you are probably already compromised.
2) No, you can load assembly code dynamically from a file, but this would just be reinventing the wheel, just use a DLL.
3) Firefox extensions don't, not even with its javascript plugins. Everything else I know uses native code from dynamic libraries, and is therefore impossible to guarantee safety. Then again Chrome has NaCL which does extensive analysis on the binary code and rejects it if it can't be 100% sure it doesn't violate bounds and what not, although I'm sure they will have more and more vulnerabilities as time passes.
4) Plugins are fine, just restrict them to trusted people. Alternatively, you could use a safe language like LUA, Python, Java, etc, and load a file into that language but restrict it only to a subset of API that wont harm your program or environment.
(1) Can you use OS security facilities to prevent unauthorized access to the folder where the DLL's are searched or loaded from? That should be your first approach.
Otherwise: run a threat analysis - what's the risk, what are known attack vectors, etc.
(2) Not necessarily. It is the most straigtforward if you want compiled plugins - which is mostly a question of performance, access to OS funcitons, etc. As mentioned already, consider scripting languages.
(3) Usually by writing "to prevent malicous code execution, restrict access to the plugin folder".
(4) There's quite some additional cost - even when using a plugin framework you are not yet familiar with. it increases cost of:
the core application (plugin functionality)
the plugins (much higher isolation)
installation
debugging + diagnostics (bugs that occur only with a certain combinaiton of plugins)
administration (users must know of, and manage plugins)
That pays only if
installing/updating the main software is much more complex than updating the plugins
individual components need to be updated individually (e.g. a user may combine different versions of plugins)
other people develop plugins for your main application
(There are other benefits of moving code into DLL's, but they don't pertain to plugins as such)
What if a malicious code is made as a DLL
Generally, if you do not trust dll, you can't load it one way or another.
This would be correct for almost any other language even if it is interpreted.
Java and some languages do very hard job to limit what user can do and this works only because they run in virtual machine.
So no. Dll loaded plug-ins can come from trusted source only.
Is using DLLs the only way available to implement plugin design pattern?
You may also embed some interpreter in your code, for example GIMP allows writing plugins
in python.
But be aware of fact that this would be much slower because if nature of any interpreted language.
We have a product very similar in that it uses modules to extend functionality.
We do two things:
We use BPL files which are DLLs under the covers. This is a specific technology from Borland/Codegear/Embarcadero within C++ Builder. We take advantage of some RTTI type features to publish a simple API similar to the main (argv[]) so any number of paramters can be pushed onto the stack and popped off by the DLL.
We also embed PERL into our application for things that are more business logic in nature.
Our software is an accounting/ERP suite.
Have a look at existing plugin architectures and see if there is anything that you can reuse. http://git.dronelabs.com/ethos/about/ is one link I came across while googling glib + plugin. glib itself might may it easier to develop a plugin architecture. Gstreamer uses glib and has a very nice plugin architecture that may give you some ideas.

Resources