How to identify language using ML.NET? - sql-server

Is it possible to identify language using ML.NET like fastText does it, but it is in python:
https://fasttext.cc/docs/en/language-identification.html
But I'd like to do it in SQLCLR function and in NET Core application.

SQL Server should be .NET Framework only, so I don't see a single .NET Core library being an option.
Also, while the ReadMe for the main ML.NET repository does state:
ML.NET also works on the .NET Framework 4.6.1 or later, but 4.7.2 or later is recommended.
Also, a white-paper on ML.NET stated that a portion of it is written in C++, and that could mean that one or more DLLs are mixed-mode (not pure MSIL) in which case that would not load into SQL Server under any circumstance (i.e. not even marked as UNSAFE). But you are certainly welcome to try loading the ML.NET libraries into SQL Server to see if it works. But even if it does, you likely have a lot of work ahead of you in terms of re-creating what they did with fastText.
You might be able to make use of this C# wrapper for fastText:
https://github.com/rafael-aero/fastText/tree/master/vs2015
You will still need the main fastText library, fastText.dll, as the wrapper code will call it. The wrapper code will need to be loaded as UNSAFE due to the calls to unmanaged code.
If you do try this and it does work, please let us know.

Related

Porting a UNIX daemon to a Windows Service

I wrote a UNIX daemon, in C, which I want to port to Windows.
My target is Windows 10.
When I search on how to create a Windows service, I am met by an approach using .NET and C# which I both want to avoid at all cost.
How can I make a simple straightforward service in C, without kitchen sinks that Microsoft tries to unload on me? If I really have to, I would consider C++, but C# and .Net are simply taking it too far.
I'm fine with switching to a different compiler too, if this is easier outside of VisualStudio. (Currently I am using Visual Studio 2019, latest update.)
NOTE: My Linux daemon just has one dependency: libhidapi which is available for Windows.
UPDATE
No C++ templates available.
This comes down to creating two different applications (.exe)
One to run the service using StartServiceCtrlDispatcher() where the dispatching function calls RegisterServiceCtrlHandlerEx() to register a controller.
One to install the service using CreateService()
It is possible to skip on the installer, and use the sc command line utility for that, which is part of the OS.
See full example.

Static / Dynamic source code analysis

I took a class named "Secure Code", and in our next assignment we are supposed to do static / dynamic analysis of some C files and of a JavaEE Web Project.
I checked out "Source Monitor" and ran it on the C files, but (unless I didn't get how to use it!) it doesn't seem to do what I'm looking for.
Considering the topic, I'd be interested in knowing if there are tools for detecting "insecure" code, i.e. code that is potentially attackable through buffer overflows, SQL-Injections, XSS ... So I'd like it to point out which functions should be "upgraded" (e.g. fgets instead of gets, or a PreparedStatement instead of a normal SQL statement)
Note: I'd prefer open source softwares, possibly for Windows (I have Ubuntu on a VM but I am not really good with it... I generally spend more time finding out how to configure the tools than running them).
Thank you for your tips!
Frama-C's value analysis is open-source, available pre-compiled for Windows, and was used to find such security bugs as this one in the QuickLZ C library or this one in Polar SSL.
This said, you may find that it is a lot to get used to for just a school assignment, and then again, are you actually expected to find security bugs in a school assignment?
For the JavaEE Web Project use Persistence API, and you can use non-SQL statements, where hacking is theoretically impossible! The best open source one is the Hibernate. It's easy to use and very flexible.

Best way to implement plugin framework - are DLLs the only way (C/C++ project)?

Introduction:
I am currently developing a document classifier software in C/C++ and I will be using Naive-Bayesian model for classification. But I wanted the users to use any algorithm that they want(or I want in the future), hence I went to separate the algorithm part in the architecture as a plugin that will be attached to the main app # app start-up. Hence any user can write his own algorithm as a plugin and use it with my app.
Problem Statement:
The way I am intending to develop this is to have each of the algorithms that user wants to use to be made into a DLL file and put into a specific directory. And at the start, my app will search for all the DLLs in that directory and load them.
My Questions:
(1) What if a malicious code is made as a DLL (and that will have same functions mandated by plugin framework) and put into my plugins directory? In that case, my app will think that its a plugin and picks it and calls its functions, so the malicious code can easily bring down my entire app down (In the worst case could make my app as a malicious code launcher!!!).
(2) Is using DLLs the only way available to implement plugin design pattern? (Not only for the fear of malicious plugin, but its a generic question out of curiosity :) )
(3) I think a lot of softwares are written with plugin model for extendability, if so, how do they defend against such attacks?
(4) In general what do you think about my decision to use plugin model for extendability (do you think I should look at any other alternatives?)
Thank you
-MicroKernel :)
Do not worry about malicious plugins. If somebody managed to sneak a malicious DLL into that folder, they probably also have the power to execute stuff directly.
As an alternative to DLLs, you could hook up a scripting language like Python or Lua, and allow scripted plugins. But maybe in this case you need the speed of compiled code?
For embedding Python, see here. The process is not very difficult. You can link statically to the interpreter, so users won't need to install Python on their system. However, any non-builtin modules will need to be shipped with your application.
However, if the language does not matter much to you, embedding Lua is probably easier because it was specifically designed for that task. See this section of its manual.
See 1. They don't.
Using a plugin model sounds like a fine solution, provided that a lack of extensibility really is a problem at this point. It might be easier to hard-code your current model, and add the plugin interface later, if it turns out that there is actually a demand for it. It is easy to add, but hard to remove once people started using it.
Malicious code is not the only problem with DLLs. Even a well-meaning DLL might contain a bug that could crash your whole application or gradually leak memory.
Loading a module in a high-level language somewhat reduces the risk. If you want to learn about embedding Python for example, the documentation is here.
Another approach would be to launch the plugin in a separate process. It does require a bit more effort on your part to implement, but it's much safer. The seperate process approach is used by Google's Chrome web browser, and they have a document describing the architecture.
The basic idea is to provide a library for plugin writers that includes all the logic for communicating with the main app. That way, the plugin author has an API that they use, just as if they were writing a DLL. Wikipedia has a good list of ways for inter-process communication (IPC).
1) If there is a malicious dll in your plugin folder, you are probably already compromised.
2) No, you can load assembly code dynamically from a file, but this would just be reinventing the wheel, just use a DLL.
3) Firefox extensions don't, not even with its javascript plugins. Everything else I know uses native code from dynamic libraries, and is therefore impossible to guarantee safety. Then again Chrome has NaCL which does extensive analysis on the binary code and rejects it if it can't be 100% sure it doesn't violate bounds and what not, although I'm sure they will have more and more vulnerabilities as time passes.
4) Plugins are fine, just restrict them to trusted people. Alternatively, you could use a safe language like LUA, Python, Java, etc, and load a file into that language but restrict it only to a subset of API that wont harm your program or environment.
(1) Can you use OS security facilities to prevent unauthorized access to the folder where the DLL's are searched or loaded from? That should be your first approach.
Otherwise: run a threat analysis - what's the risk, what are known attack vectors, etc.
(2) Not necessarily. It is the most straigtforward if you want compiled plugins - which is mostly a question of performance, access to OS funcitons, etc. As mentioned already, consider scripting languages.
(3) Usually by writing "to prevent malicous code execution, restrict access to the plugin folder".
(4) There's quite some additional cost - even when using a plugin framework you are not yet familiar with. it increases cost of:
the core application (plugin functionality)
the plugins (much higher isolation)
installation
debugging + diagnostics (bugs that occur only with a certain combinaiton of plugins)
administration (users must know of, and manage plugins)
That pays only if
installing/updating the main software is much more complex than updating the plugins
individual components need to be updated individually (e.g. a user may combine different versions of plugins)
other people develop plugins for your main application
(There are other benefits of moving code into DLL's, but they don't pertain to plugins as such)
What if a malicious code is made as a DLL
Generally, if you do not trust dll, you can't load it one way or another.
This would be correct for almost any other language even if it is interpreted.
Java and some languages do very hard job to limit what user can do and this works only because they run in virtual machine.
So no. Dll loaded plug-ins can come from trusted source only.
Is using DLLs the only way available to implement plugin design pattern?
You may also embed some interpreter in your code, for example GIMP allows writing plugins
in python.
But be aware of fact that this would be much slower because if nature of any interpreted language.
We have a product very similar in that it uses modules to extend functionality.
We do two things:
We use BPL files which are DLLs under the covers. This is a specific technology from Borland/Codegear/Embarcadero within C++ Builder. We take advantage of some RTTI type features to publish a simple API similar to the main (argv[]) so any number of paramters can be pushed onto the stack and popped off by the DLL.
We also embed PERL into our application for things that are more business logic in nature.
Our software is an accounting/ERP suite.
Have a look at existing plugin architectures and see if there is anything that you can reuse. http://git.dronelabs.com/ethos/about/ is one link I came across while googling glib + plugin. glib itself might may it easier to develop a plugin architecture. Gstreamer uses glib and has a very nice plugin architecture that may give you some ideas.

Is Extendible program in C possible?

I am looking into making a C program which is divided into a Core and Extensions. These extensions should allow the program to be extended by adding new functions. so far I have found c-pluff a plugin framework which claims to do the same. if anybody has any other ideas or reference I can check out please let me know.
You're not mentioning a platform, and this is outside the support of the language itself.
For POSIX/Unix/Linux, look into dlopen() and friends.
In Windows, use LoadLibrary().
Basically, these will allow you to load code from a platform-specific file (.so and .dll, respectively), look up addresses to named symbols/functions in the loaded file, and access/run them.
I tried to limit myself to the low-level stuff, but if you want to have a wrapper for both of the above, look at glib's module API.
The traditional way on windows is with DLLs. But this kind of obselete. If you want users to actually extend your program (as opposed to your developer team releasing official plugins) you will want to embed a scripting language like Python or Lua, because they are easier to code in.
You can extend your core C/C++ program using some script language, for example - Lua
There are several C/C++ - Lua integration tools (toLua, toLua++, etc.)
Do you need to be able to add these extensions to the running program, or at least after the executable file is created? If you can re-link (or even re-compile) the program after having added an extension, perhaps simple callbacks would be enough?
If you're using Windows you could try using COM. It requires a lot of attention to detail, and is kind of painful to use from C, but it would allow you to build extension points with well-defined interfaces and an object-oriented structure.
In this usage case, extensions label themselves with a 'Component Category' defined by your app, hwich allows the Core to find and load them withough havng to know where their DLLs are. The extensions also implement interfaces that are specified using IDL and are consumed by the core.
This is old tech now, but it does work.

Extending PythonCE to Access gsm/camera/gps Easily from PythonCE

As it seems there is no scripting language for Windows mobile devices that gives access to phone (sms, mms, make a call, take photo). I wonder how complex it would be to make a Python library that would enable that (write something in C, compile, and import in PythonCE).
Question: Where shall start to understand how to compile a PythonCE module that will give additional functionality to Python on Windows mobile. Also, what is the required toolkit. Is it at all possible on Mac (Leopard)?
As the first step, you should try to create executable programs that invoke the functions you want. For example, to send SMS, it appears you need to call MailSwitchToAccount, passing "SMS", and so on - familiarize yourself with the C API on the platform.
To create executables, you need Visual Studio, and the Windows Mobile SDK. Those run on Windows. For cross-compilation, there is CeGCC (http://cegcc.sourceforge.net/docs/using.html), but using it probably makes things more complicated than using the Microsoft tools.
When you have executables that perform the functions you desire, creating Python extension modules out of them should be easy. Just follow the extending-and-embedding tutorials.
MSDN has plenty of samples for C++ development on Windows Mobile, and the SDK comes with several sample application. Unfortunately VS Express editions (the free ones) do not come with compilers for Smart Devices. The only free option is the older eMbedded Visual C++ (eVC), which is now something like 8 years old and not supported (though it can still create apps for devices at least up through CE 5.0).
just tried establishing an environment to get pythonce modules compiled (http://pythonce.sourceforge.net/Wikka/SConsBuild) but seems that I can only use 2003 PPC SDK and it has no recent functions available. Even when I followed all the steps in tutorial, sample spammodule.c does not compile :(
Is there any good tutorial I can utilize to startup C (C++) programming for Windows Mobile?
Also is it possible using free version of VisualStudio (Express version)?

Resources