Running C code in Elixir/Erlang: Ports or NIFs? - c

I've found that Elixir programs can run C code either via NIFs (native implemented functions) or via OS-level ports. Having read those and similar links, I'm not a hundred percent clear on when to use one or the other method (or something else entirely?), and feel it would be good to have a direct comparison available, for myself and other novices. Can anyone provide?

What are ports?
Ports are basically separate programs which are run separately from the Erlang VM. The Erlang VM communicates with the running port over standard input/output, and the resulting port lives behind an Erlang process that owns it and can facilitate communication between the port and the rest of your Erlang or Elixir application. Ports are "safe" in the sense that if the port crashes, it doesn't bring down the whole Erlang VM.
Porcelain might be of interest as a possible improvement and expansion over what's already provided in the Port module. System.cmd/3 also uses ports in its underlying implementation.
What are NIFs?
Native inline functions or "NIFs" are functions defined in what are essentially shared libraries / DLLs loaded by the Erlang VM and written using some language which exposes a C-compatible ABI. NIFs are more efficient than ports (since they don't have to communicate over STDIN/STDOUT) and are simpler in many respects (since you don't have to deal with encoding and decoding data between your Elixir and non-Elixir codebases), but they're also much less safe; a NIF can crash the Erlang VM, and a long-running NIF can potentially lock up the Erlang VM (since the scheduler can't reason about native code).
What are port drivers?
Port drivers are kind of an in-between approach to integrating external code with an Erlang or Elixir codebase. Like NIFs, they're loaded into the Erlang VM, and a port driver can therefore crash or hang the whole VM. Like ports, they behave similarly to Erlang processes.
When should I use a port?
You want your external code to behave like an ordinary Erlang process (at least enough for such a process to wrap it and send/receive messages on behalf of your external code)
You want the Erlang VM to be able to survive your external code crashing
You want to implement a long-running task in your external code
You want to write your external code in a language that does not support C-compatible FFI (or otherwise don't want to deal with your language's FFI facilities)
When should I use a NIF?
You want your external code to behave like a collection of ordinary Erlang functions (particularly if you want to define an Erlang/Elixir module that exports functions implemented in native-compiled code)
You want to avoid any potential performance hits / overhead from communicating via standard input/output and/or you want to avoid having to translate between Erlang terms and something your external code understands
You are reasonably confident that the things your external code is doing are neither long-running nor likely to crash (including, in the latter case, if you're writing your NIFs in something like Rust; see also: Rustler), or...
You are reasonably confident that crashing or hanging the Erlang VM is acceptable for your use case (e.g. your code is both distributed and able to survive the sudden loss of an Erlang node, or you're writing a desktop application and an application-wide crash is not a big deal aside from being an inconvenience to users)
When should I use port drivers?
You want your external code to behave like an Erlang process
You want to avoid the overhead and/or complexity of communicating over standard input/output
You are reasonably confident that your port driver won't crash or hang the Erlang VM, or...
You are reasonably confident that a crash or hang of the Erlang VM is not a critical issue
What do you recommend?
There are two aspects to weigh here:
Process-like v. module-like
Safe v. efficient
If you want maximum safety behind a process-like interface, go with a port.
If you want maximum safety behind a module-like interface, go with a module with functions that either wrap System.cmd/3 or directly use a port to communicate with your external code
If you want better efficiency behind a process-like interface, go with a port driver.
If you want better efficiency behind a module-like interface, go with NIFs.

Related

Using sock_create, accept, bind etc in kernel

I'm trying to implement an echo TCP server as a loadable kernel module.
Should I use sock_create, or sock_create_kern?
Should I use accept, or kernel_accept?
I mean it does make sense that I should use kernel_accept for example; but I don't know why. Can't I use normal sockets in the kernel?
The problem is, you are trying to shoehorn an user space application into the kernel.
Sockets (and files and so on) are things the kernel provides to userspace applications via the kernel-userspace API/ABI. Some, but not all, also have an in-kernel callable, for cases when another kernel thingy wishes to use something provided to userspace.
Let's look at the Linux kernel implementation of the socket() or accept() syscalls, in net/socket.c in the kernel sources; look for SYSCALL_DEFINE3(socket, and SYSCALL_DEFINE3(accept,, SYSCALL_DEFINE4(recv,, and so on.
(I recommend you use e.g. Elixir Cross Referencer to find specific identifiers in the Linux kernel sources, then look up the actual code in one of the official kernel Git trees online; that's what I do, anyway.)
Note how pointer arguments have a __user qualifier: this means the data pointed to must reside in user space, and that the functions will eventually use copy_from_user()/copy_to_user() to retrieve or set the data. Furthermore, the operations access the file descriptor table, which is part of the process context: something that normally only exist for userspace processes.
Essentially, this means your kernel module must create an userspace "process" (enough of one to satisfy the requirements of crossing the userspace-kernel boundary when using kernel interfaces) to "hold" the memory and file descriptors, at minimum. It is a lot of work, and in the end, it won't be any more performant than an userspace application would be. (Linux kernel developers have worked on this for literally decades. There are some proprietary operating systems where doing stuff in "kernel space" may be faster, but that is not so in Linux. The cost to do things in userspace is some context switches, and possibly some memory copies (for the transferred data).)
In particular, the TCP/IP and UDP/IP interfaces (see e.g. net/ipv4/udp.c for UDP/IPv4) do not seem to have any interface for kernel-side buffers (other than directly accessing the rx/tx socket buffers, which are in kernel memory).
You have probably heard of TUX web server, a subsystem patch to the Linux kernel by Ingo Molnár. Even that is not a "kernel module server", but more like a subsystem that an userspace process can use to implement a server that runs mostly in kernel space.
The idea of a kernel module that provides a TCP/IP and/or UDP/IP server, is simply like trying to use a hammer to drive in screws. It will work, after a fashion, but the results won't be pretty.
However, for the particular case of an echo server, it just might be possible to bolt it on top of IPv4 (see net/ipv4/) and/or IPv6 (see net/ipv6/) similar to ICMP packets (net/ipv4/icmp.c, net/ipv6/icmp.c). I would consider this route if and only if you intend to specialize in kernel-side networking stuff, as otherwise everything you'd learn doing this is very specialized and not that useful in practice.
If you need to implement something kernel-side for an exercise or something, I'd recommend steering away from "application"-type ideas (services or similar).
Instead, I would warmly recommend developing a character device driver, possibly implementing some kind of inter-process communications layer, preferably bus-style (i.e., one sender, any number of recipients). Something like that has a number of actual real-world use cases (both hardware drivers, as well as stranger things like kdbus-type stuff), so anything you'd learn doing that would be real-world applicable.
(In fact, an echo character device -- which simply outputs whatever is written to it -- is an excellent first target. Although LDD3 is for Linux kernel 2.6.10, it should be an excellent read for anyone diving into Linux kernel development. If you use a more recent kernel, just remember that the example code might not compile as-is, and you might have to do some research wrt. Linux kernel Git repos and/or a kernel source cross referencer like Elixir above.)
In short sockets are just a mechanism that enable two processes to talk, localy or remotely.
If you want to send some data from kernel to userspace you have to use kernel sockets sock_create_kern() with it's family of functions.
What would be the benefit of TCP echo server as kernel module?
It makes sense only if your TCP server provides data which is otherwise not accessible from userspace, e.g. read some post-mortem NVRAM which you can't read normally and to send it to rsyslog via socket.

Writing program to run on server, requesting experienced advice

I'm developing a program that will need to run on Internet servers (a back-end component to be used by several cross-platform programs). I'm familiar with the security precautions to take (to prevent buffer overflows and SQL Injection attacks, for instance), but have never written a server program before, or any program that will be used on this scale.
The program needs to be able to serve hundreds or thousands of clients simultaneously. The protocols are designed for processing speed and to minimize the amount of data that must be exchanged, and the server side will be written in C. There will be both a Windows and a Linux version from the same code.
Questions:
How should the program handle communications -- multiple threads, a single thread handling all the sockets in turn, or spawn a new process for every so many incoming connections (or for each one)?
Do I need to worry about things like memory fragmentation, since this program will need to run for months at a time?
What other design issues, specific to this kind of programming, might an experienced developer of cross-platform programs for desktop and mobile systems not be aware of?
Please, no suggestions to use a different language. That decision has already been made, for reasons I'm not at liberty to go into.
For I'd use libevent or libev and non-blocking I/O. This way the operating system will take case of most of your scheduling problems. I'd also use a thread pool for processing tasks, that by nature are blocking, so they don't block the main loop. And if you ever need to read or write large amounts of data to or from the disc, use mmap, again to let the OS handle as much as possible.
The basic advice is use the OS, as much as possible. If you want a good example of a program which does this look at Varnish, it is very well written, and performs fantastic.
With my experience running multiple servers for over 3 years of uptime, and programs with little over a year of uptime I can still recommend making the setup so that the system gracefully recovers from a program error and from a server reboot.
Even though performance gets a hit when a program is restarted, you need to be able to handle that as external circumstances can force the program to such a restart.
Don't try to reinvent the wheel when not needed, and have a look at zeromq or something like that to handle distribution of incoming communications. (If you are allowed to, prototype the backends in a more forgiving language than C like Python, then reimplement in C but keeping the communications protocol)

Best Labview IPC

I need to make labview communicate with a C/C++ application. Both the applications run on the same machine. What is the IPC mechanism with lower overhead and highest speed available in LabView?
TCP, UDP, ActiveX, DDE, file transactions, or perhaps just directly calling a dll are the solutions that come to mind.
First I'd just call a dll if you can manage with that. Assuming you're tied in to using two separate applications then:
I'd use TCP or UDP. File transactions are clunky but easy to implement, DDE is older but might be viable (I'd recommend against it).
Basic TCP/IP in Labview
TCP/IP and UDP in Labview
Calling a dll from Labview
Have you investigated straight up TCP or UDP?
It'll make it easy if you ever need to separate the applications onto different machines later on down the road. Implementation is pretty straight forward too, although it may not be the fastest throughput.
What speeds are we talking about here?
NI has provided a thorough document explaining that: Using External Code in LabVIEW [pdf]. In brief, you can use:
Shared Libraries (on windows they are called DLLs). According to the above document, any
language can be used to write DLLs as long as the DLLs can be called
using one of the calling conventions LabVIEW supports, either
stdcall or C."
Code Interface Node (CIN), which is a block diagram node that links C/C++ source
code to LabVIEW.
.NET technology.
Note that "Shared Libraries" and "Code Interface Node" are supported on Windows, Max OS X, Linux and Solaris.

Sending calls to libraries remotely across linux

I am developing some experimental setup in C.
I am exploring a scenario as follows and I need help to understand it.
I have a system A which has a lot of Applications using cryptographic algorithms.
But these crypto calls(openssl calls) should be sent to another System B which takes care of cryptography.
Therefore, I have to send any calls to cryptographic (openssl) engines via socket to a remote system(B) which has openssl support.
My plan is to have a small socket prog on System A which forwards these calls to system B.
What I'm still unclear at this moment is how I handle the received commands at System B.
Do I actually get these commands and translate them into corresponding calls to openssl locally in my system? This means I have to program whatever is done on System A right?
Or is there a way to tunnel/send these raw lines of code to the openssl libs directly and just received the result and then resend to System A
How do you think I should go about the problem?
PS: Oh by the way, the calls to cryptography(like EngineUpdate, VerifyFinal etc or Digest on System A can be either on Java or C.. I already wrote a Java/C program to send these commands to System B via sockets...
The problem is only on System B and how I have to handle..
You could use sockets on B, but that means you need to define a protocol for that. Or you use RPC (remote procedure calls).
Examples for socket programming can be found here.
RPC is explained here.
The easiest (not to say "the easy", but still) way I can imagine would be to:
Write wrapper (proxy) versions of the libraries you want to make remote.
Write a server program that listens to calls, performs them using the real local libraries, and sends the result back.
Preload the proxy library before running any application where you want to do this.
Of course, there are many many problems with this approach:
It's not exactly trivial to define a serializing protocol for generic C function calls.
It's not exactly trivial to write the server, either.
Applications will slow a lot, since the proxy call needs to be synchronous.
What about security of the data on the network?
UPDATE:
As requested in a comment, I'll try to expand a bit. By "wrapper" I mean a new library, that has the same API as another one, but does not in fact contain the same code. Instead, the wrapper library will contain code to serialize the arguments, call the server, wait for a response, de-serialize the result(s), and present them to the calling program as if nothing happened.
Since this involves a lot of tedious, repetitive and error-prone code, it's probably best to abstract it by making it code-driven. The best would be to use the original library's header file to define the serialization needed, but that (of course) requires quite heavy C parsing. Failing that, you might start bottom-up and make a custom language to describe the calls, and then use that to generate the serialization, de-serialization, and proxy code.
On Linux systems, you can control the dynamic linker so that it loads your proxy library instead of the "real" library. You could of course also replace (on disk) the real library with the proxy, but that will break all applications that use it if the server is not working, which seems very risky.
So you basically have two choices, each outlined by unwind and ammoQ respectively:
(1) Write a server and do the socket/protocol work etc., yourself. You can minimize some of the pain by using solutions like Google's protocol buffers.
(2) use an existing middleware solution like (a) message queues or (b) an RPC mechanism like CORBA and its many alternatives
Either is probably more work than you anticipated. So really you have to answer this yourself. How serious is your project? How varied is your hardware? How likely is the hardware and software configuration to change in the future?
If this is more than a learning or pet project you are going to be bored with in a month or two then an existing middleware solution is probably the way to go. The downside is there is a somewhat intimidating learning curve.
You can go the RPC route with CORBA, ICE, or whatever the Java solutions are these days (RMI? EJB?), and a bunch of others. This is an elegant solution since your calls to the remote encryption machine appear to your SystemA as simple function calls and the middleware handles the data issues and sockets. But you aren't going to learn them in a weekend.
Personally I would look to see if a message queue solution like AMQP would work for you first. There is less of a learning curve than RPC.

How to scale a TCP listener on modern multicore/multisocket machines

I have a daemon to write in C, that will need to handle 20-150K TCP connections simultaneously. They are long running connections, and rarely ever tear down. They have a very small amount of data (rarely exceeding MTU even.. it's a stimulus/response protocol) in transmit at any given time, but response times to them are critical. I'm wondering what the current UNIX community is using to get large amounts of sockets, and minimizing the latency on response of them. I've seen designs revolving around multiplexing connects to fork worker pools, threads (per connection), static sized thread pools. Any suggestions?
the easiest suggestion is to use libevent, it makes it easy to write a simple non-blocking single-threaded server that would comply with your requirements.
if the processing for each response takes some time, or if it uses some blocking API (like almost anything from a DB), then you'll need some threading.
One answer is the worker threads, where you spawn a set of threads, each listening on some queue to work. it can be separate processes, instead of threads, if you like. The main difference would be the communications mechanism to tell the workers what to do.
A different way to do is to use several threads, and give to each of them a portion of those 150K connections. each will have it's own process loop and work mostly like the single-threaded server, except for the listening port, which will be handled by a single thread. This helps spreading the load between cores, but if you use a blocking resource, it would block all the connections handled by this specific thread.
libevent lets you use the second way if you're careful; but there's also an alternative: libev. it's not as well known as libevent, but it specifically supports the multi-loop scheme.
If performance is critical then you'll really want to go for a multithreaded event loop solution - i.e. a pool of worker threads to handle your connections. Unfortunately, there is no abstraction library to do this that works on most Unix platforms (note that libevent is only single-threaded as are most of these event-loop libraries), so you'll have to do the dirty work yourself.
On Linux that means using edge-triggered epoll with a pool of worker threads (Windows would have I/O completion ports which also works fine in a multithreaded environment - I am not sure about other Unixes).
BTW, I have done some work trying to abstract edge-triggered epoll on Linux and Windows I/O completion ports on http://nginetd.cmeerw.org (it is work in progress, but might provide some ideas).
If you have system configuration access don't over-do it and set up some iptables/pf/etc to load-balance connections across n daemon instances (processes) as this will work out of the box. Depending on how blocking the nature of the daemon n should be from the number of cores on the system or several times higher. This approach looks crude but it can handle broken daemons and even restart them if necessary. Also migration would be smooth as you could start diverting new connections to another set of processes (for example, a new release or migrating to a new box) instead of service interruptions. On top of that you get several features like source affinity wich can help significantly caching and contention of problematic sessions.
If you don't have system access (or ops can't be bothered), you can use load balancer daemon (there are plenty of open source ones) instead of iptables/pf/etc and use also n service daemons, like above.
Also this approach helps with separating privileges of ports. If the external service needs to service on a low port (<1024) you only need the load balancer running privileged/or admin/root, or kernel.)
I've written several IP load balancers in the past and it can be very error-prone in production. You don't want to support and debug that. Also operations and management will tend second-guess your code more than external code.
i think javier's answer makes the most sense. if you want to test the theory out, then check out the node javascript project.
Node is based on Google's v8 engine which compiles javascript to machine code and is as fast as c for certain tasks. It is also based on libev and is designed to be completely non-blocking, meaning you don't have to worry about context switching between threads (everything runs on a single event loop). It is very similar to erlang in that respect.
Writing high performance servers in javascript is now really, really easy with node. You could also, with a little bit of effort, write your custom code in c and create bindings for node to call into it to do your actual processing (look at the node source to see how to do this - documentation is a little sketchy at the moment). as an uglier alternative, you could build your custom c code as an application and use stdin/stdout to communicate with it.
I've tested node myself with upwards of 150k connections with absolutely no issues (of course you will need some serious hardware if all these connections are going to be communicating at once). A TCP connection in node.js on average uses only 2-3k of memory so you could theoretically handle 350-500k connections per 1GB of RAM.
Note - Node.js is not currently supported on windows, but it is only at an early stage of development and i'd imagine it will be ported at some stage.
Note 2 - you will have to ensure the code you are calling into from Node does not block
Several systems have been developed to improve on select(2) performance: kqueue, epoll, and /dev/poll. In all these systems, you can have a pool of worker threads waiting for tasks; you will not be forced to setup all file handles over and over again when done with one of them.
do you have to start from scratch? You could use something like gearman.

Resources