Override Ruby's base C code from Gem - c

For the sake of experimentation, I am looking for a way to modify some of Ruby's base code, specifically the parser. I was wondering if this was possible to do at all, let alone using a Gem.
I have narrowed the code I need to change to static int yylex() within parser.c. I was going to try to use an alias, but that seems to require that I change parser.h, which cannot be done within a Gem, as I understand.
Can this be done from a Gem?

No.
The only base C code that gems have access to is that exposed by the Ruby headers.
The parsing/lexing code is not exposed there.
If you want to define custom syntax, I would try (in order):
Loosen your requirements a bit and define a DSL. Ruby has insanely powerful metaprogramming features that can take anything you might do statically in a script and instead do it dynamically during runtime
Write your custom parser in Ruby and emit valid Ruby which you then eval. Ugly, and probably a little slow, but will allow you to do anything you want.
Modify the mruby parser instead. mruby is designed for embedded applications where you want to be able to highly customize the capabilities of the VM. I doubt that they had the parser in mind, but still it might be more feasible than messing around with MRI.

Related

pyOCCT vs PythonOCC for new project (2020)

I am starting a new project with some 3D CAD objects to be generated from a specific domain data. I can code it with c++ using OpenCascade but I prefer to use Python if possible. There are two popular occ python bindings: pyOCCT and PythonOCC, both projects are active and up to date with occ 7.4 but it will be great to have an advice from someone who knows both. As pyOCCT is a newer project, I suppose it solves something that PythonOCC does not, but it is not clear what was the motivation to create a new binding. I will need some web rendering support, apparently PythonOCC already support web rendering.
A little background: I was working on a project and using pythonocc. It's a great project, but at the time was stuck on OCE (OpenCASCADE Community Edition) v6 and OpenCASCADE (official) had since released v7+ with a lot of performance improvements. I attempted to update pythonocc wrappers to v7+ but with 7+ they made much more use of c++ templates and I couldn't get a handle on how to do this in SWIG. When I tried doing this with pybind11 it seemed like a more natural fit (and I was able to get it working). So, I started pyOCCT to wrap OCCT 7+ using pybind11.
Since then, pythonocc has updated its wrapper process and now targets OCCT, so you can't go wrong with either if you are just looking for access to OCCT in Python. Pythonocc has a larger user base so you will likely find more people to collaborate with.
I've tried to keep up with SMESH and recently started up a pySMESH project for CAE applications if that is relevant to your work. That is only compatible with pyOCCT. Though again, with some extra effort you could probably write SWIG wrappers for SMESH instead of pybind11 ones if you really wanted to, making it compatible with pythonocc.
pyOCCT is based on pybind11 template library which is simple and powerful. PythonOCC is based on SWIG which is rather complicated. I have tried both, pyOCCT looks more attractive and perspective, especially if you need to add your own wrappers for some purpose. However, I do not use web rendering at all.
Neither. Use CadQuery's OCP instead, because OCP is the only OCCT Python wrapper to internally use a sane clang-based binding generator. OCP is thus roughly analogous to PySide2, which also internally uses a sane clang-based binding generator to generate its Python bindings.
Meanwhile, pyOCCT uses the hand-rolled pybind11 binding generator that requires C[++] headers to be modified with pybind11-specific macros. Since pybind11 does not leverage clang, neither does pyOCCT. Bizarrely, there's actually a downstream binding generator ambiguously named "Binder" that does leverage both pybind11 and clang. Of course, pyOCCT doesn't use Binder.
All else being equal, what you're usually looking for when choosing between higher-level language bindings to lower-level language frameworks is whether those bindings use clang or not. Prefer clang-based bindings to ad-hoc bindings that try (and usually fail) to parse C[++] with hand-rolled lexers and parsers. C[++] is a dark pit of lexical, syntactic, and semantic edge cases you never want to parse with non-standard toolchains.
Trust clang. Distrust everything not clang. This is the way.

Dynamic Interface for functions in C

Assume you have a function read_key and normally it does some stuff. You someone should be able to replace it with his function read_key_myfunction and it does other stuff.
The general approach would of course be to build an array and register function pointers or using simple switch statements (or both).
But my target is a bit broader: People should be able to write their C-stuff and NOT interfere with my code and it should still register. Of course, I tell them which interface to implement.
What they now basically do is program a library for my software which I dynamically load based on a configuration option. Think of it like OpenSSLs engines: Anyone can write their own engine, compile it as a dll/so and distribute it. They don't need to modify (or know) OpenSSLs code, as long as they stick to the defined interface.
I just want the same (it will in the end be a wrapper for OpenSSL engine functions) for my program.
A colleague suggested I should use the same function in every file and load the libraries dynamically. This sounds like a good solution to me, but I am not quite satisfied since I don't see OpenSSL using any non-engine-specific function in their engine-code.
If some things are unclear here is my specific example:
I am extending a program called sscep which implements a protocol for automatic certificate renewal. A lot of cryptography should take place in HSMs in the future (and right now it should take place within the Windows Key Management (which is accessed by the capi-engine from OpenSSL)).
While OpenSSL already serves a generic interface, there is some stuff I need to do beforehand and it depends on the engine used. I also want to open the possibility for everyone else to extend it quickly without having to dig into my code (like I had from the person before me).
If anyone has any idea, it would be greatly appreciated to see some kind of guideline. Thanks in advance.
What you are describing is commonly called a plugin architecture/plugin framework. You need to combine cross-platform dlopen/LoadLibrary functionality with some logic for registering and performing lookup of exported functions. You should be able to find examples on how to do this on the internet.

Code refactoring tools for C, usable on GNU/Linux? FOSS preferable

Variations of this question have been asked, but not specific to GNU/Linux and C. I use Komodo Edit as my usual Editor, but I'd actually prefer something that can be used from CLI.
I don't need C++ support; it's fine if the tool can only handle plain C.
I really appreciate any direction, as I was unable to find anything.
I hope I'm not forced to 'roll' something myself.
NOTE: Please refrain from mention vim; I know it exists and what its capabilities are. I purposefully choose to avoid vim, which is why I use Komodo (or nano on the servers).
I don't think that a pure console refactoring tool would be nice to use.
I use Eclipse CDT on linux to write and refactor C-Code.
There exists also Xrefactory for Emacs http://www.xref.sk/xrefactory/main.html
if a non console refactoring tool is o.k for you as well.
C-xrefactory was an open source version of xrefactory, covering C and Java, made available on SourceForge by Marián Vittek under GPLv2.
For those interested, there's an actively maintained c-xrefactory fork on GitHub:
https://github.com/thoni56/c-xrefactory
The goal of the GitHub fork is to refactor c-xrefactory itself, add a test suite, and try to document the original source code (which is rather obscure). Maybe, in the future, also convert it into an LSP C language server and refactoring tool.
C-xrefactory works on Emacs; setup scripts and instructions can be found at the repository. Windows users can run it via WSL/WSL2.
You could consider coding a GCC plugin or a MELT extension (MELT is a domain specific language to extend GCC) for your needs.
However, such approach would take you some time, because you'll need to understand some of GCC internals.
For Windows only, and not FOSS but you said "any direction..."
Our DMS Software Reengineering Toolkit" with its C Front End can apply transformations to C source code. DMS can be configured to carry out custom, complex reliable transformations, although the configuration isn't as easy as typing just a command like "refactor frazzle by doobaz".
One of the principal stumbling blocks is still the preprocessor. DMS can transform code that has preprocessor directives in typical places (around statements, expressions, if/for/while loop heads, declarations, etc.) but other "unstructured conditionals" give it trouble. You can run DMS by expanding the preprocessor directives out of existence, or more imporantly, expanding out the ones that give it trouble, but mostly people don't like this because they prefer to keep thier preprocessor directives. So it isn't perfect.
[Another answer suggested Concinelle, which looks pretty good from my point of view. As far as I know, it doesn't handle preprocessor directives at all; I could be wrong and it might handle some cases as DMS does, but I'm sure it can't handle all the cases].
You don't want to consider rolling your own. Building a transformation/refactoring tool is much harder than you might guess having never tried it. You need full, accurate parsers for the (C) dialect of interest and just that is pretty hard to get right. You need a preprocessor, symbol tables, flow analysis, transformation, code regeneration machinery, ... this stuff takes years of effort to build and get right. Trust me, been there, done that.

How can I unit test a managed wrapper around C code?

I will be creating a Managed-C++ wrapper around some C functions to allow its use in other .NET solutions. I'm looking at providing a very minimalist wrapper, something like:
Signature in C header:
void DOSTH(const char*, short, long*);
Exposed managed interface:
public void doSomething(String^ input, short param, [Out] long^ %result);
To do so my solution will have the C headers and will reference the .dll that contains the compiled C API that I am building against.
As a Visual Studio newbie I'm unsure how I would unit test this. Is it possible to mock out the .dll to provide a mock implementation? Is there a library that would make this kind of task easy? Is there a particular solution structure I should aim for to make this easier?
Any guidance in this area would be great. Google searches have left me wanting for more info on unit testing a managed wrapper.
In some cases (tools limitations and/or dependency complexity comes to my mind), mocking dependency using external frameworks is out of question. Then, there's totally legitimate technique of writing mocks manually (I think that was the way to do stuff before mocking frameworks grew in popularity).
And that's basically what you want to do - fake out dependency, which in your case happens to be C library. Frameworks can't help - you might want to try manual approach.
Create some simple, faked implementation (pretty much like a stub, eg. only returning fixed values regardless of input params - naturally, might be more sophisticated than that), compile it, let it expose exactly the same headers/functions and reference it in your test project. That's the essential idea behind faking (stubbing/mocking) - one object pretending to be another.
As simple as it sounds, I haven't actually tried that - take it with a grain of salt and more as a suggestion which way you could go. Limitation of this approach (apart from whether it actually is technically possible) is very poor/none configuration options (since the extra faked DLL would act like a hardcoded stub - configuration files could help, but that feels like... too much work?).
Do you only need to be able to stub/mock out your wrapper so that your tests don't rely on the native dll?
Then you can declare an abstract base class for your wrapper, write one implementation that calls the native dll and another one for testing purposes that returns canned values. Or you can use a framework like Moq or Rhino.Mocks to mock your wrapper.

What language should we use to let people extend our terminal/sniffer program?

We have a very versatile terminal/sniffer application which can do all sorts of things with TCP, UDP and serial connections.
We are looking to make it extensible -- i.e, allow people to write their own protocol parsers, highlighters, etc.
We created a C-like language for extending the product, and then discovered that for some coders, this presents a steep learning curve.
We are now pondering the question: Should we stick to C or go with something like Ruby or Lua?
C is beautiful for low-level stuff (like parsing binary data), because it supports pointers. But for exactly that reason, it can be tough to learn.
Ruby (etc) are easy to learn, but don't have pointers, so anything that has to do with parsing binary data gets ugly very fast.
What do you think? For extending a product that parses binary data -- Ruby/Lua or C/C++?
Would be great if you could give some background when you respond -- especially if you've done something similar.
Wireshark, the "world's foremost network protocol analyzer", is also a packet sniffer/analyzer, formerly also called Ethereal. It uses Lua to enable writing custom dissectors and taps, see the manual.
However, note that I have not used it, so I cannot tell how nice/effective/easy to learn the API is.
Like TCL, Lua was designed to be tightly integrated with an application. Personally, I find Lua's syntax and idioms to be much easier to deal with than TCL.
Lua is easy to integrate with an existing system, and easy to extend. It is also fairly easy to create safe sandboxes in which user-supplied code can run without full access to the innards of your product.
If you have an API written does it make a difference? The person using the C-like API would only have to understand the difference between passing by value or reference.
Your core does one thing very good, so fine. Let it be that way. I think you should create an API based on std in/out, just like the way of good unix design. Then anyone can extend it in any language of choice.
Tcl was designed with the goal to allow scripting for C programs, so it would be much easier to implement.
http://en.wikipedia.org/wiki/Tcl#Interfacing_with_other_languages
I second Johan's idea. Although in past when I had to do something like this I stuck to
C language APIs and people were restricted to use C language only. But now I look at it,
I realize that it would have been more efficient if we would have done the way Johan describes
PS: And by coincidence it was a protocol testing app using packet sniffer
perl, sed, awk, lex, antler, ... These are languages I'm somewhat familiar with that I'd like to write something like this in. It depends on the data flow, though.
It's really hard to say what the correct thing to use is. Something that I don't think anyone else has mentioned is to keep in mind that the scripts will have bugs. It's very easy to design something like this in such a way that bugs in the scripts (especially run time errors) either just show up a "error in script" or kill the whole system.
You should keep that the scripts should be unit testable and that failures should be reproducible.
I don't think it matters what you do as long as you do one thing, drop the in-house language. It sounds like you choose to make C into a scripting language. One issue I see with this is it will look familiar to C programmers, but not be the same. I can't imagine you have mimicked the semantics of C that would make existing C programmers comfortable. And as you have mentioned, others will find it hard to learn.
The company I am working at have developed their own language. It uses XML for structure so parsing is easy. The language grows "as needed." Meaning if a feature is missing then it will be added. I'm pretty sure it went from an XML database to something that needed control flow. But my point is that if you aren't thinking about building it as a language, then you'll be limiting what users can do with it unintentionally.
Personally I've been looking at how I can get the company to start taking advantage of Lua. And specifically Lua for several reasons. Lua was developed as an extension language that was general purpose. It easily interfaces with the language, including Python and Ruby. It is small and simple for use by non-programmers (not really needed in your case). It is simple enough to replace XML, INI... for configuration settings and powerful enough to replace the need for another programming language.
http://www.lua.org/spe.html

Resources