Interacting with the filesystem in a custom programming language - filesystems

I'm writing a standard library for my programming language which is compiled to LLVM IR. I've also stumbled upon this interesting post Custom Programming Language ~ How to interact with the Operating System but I'm stuck on implementing interaction with the filesystem.
My language is quite similar to C, so I will use C for examples.
In C we have a struct called FILE and we can pass it to fopen, fclose etc...
How can I implement this datatype myself? Do I just create a struct with similar fields and it will work?
How do popular LLVM-based languages implement this? For example Jai, Zig, Swift ...

Generally, you pick the smallest possible base platform and then define matching function and struct types for that interface only. For example, "my language only works on linux for now and my platform is those syscalls I need, and no more" or "my language only uses the exported symbols in this C library that I write myself". I've seen both. The goal is to have a small cross-language interface.
The kernel will use lots of code, the C library may use lots of other libraries or the kernel, but the cross-language interface is kept small, since cross-language calls/types are so ticklish.

Related

Include libraries in other languages in a C application

My general question is this: what's the most common way to include libraries in other languages in a C application?
For example, if I have a Ruby library intended for doing function X, and a Python library for doing function Y, how can I write a program in C (the language, that is) that uses the functions in each?
I've seen wrappers that give access to C libraries in these higher languages, but are there wrappers that go the other way? Is there a common way of handling this in general?
Are these native-code libraries (i.e. have they been compiled?) Or are these source libraries (i.e. a bunch of text files containing Ruby source code)?
If the former, libraries in a language like Ruby or Lua or so usually have a published binary interface ("ABI"). This is low-level documentation that describes how their libraries and its functions work under the hood. Often, those are defined in C or C++, or whatever language was used to implement the interpreter/compiler for Ruby itself.
So you'd have to find that documentation, and find out how to call the parts you are interested in. Some languages even use the same ABI as C does, and you just need to create a header file that matches the contents of the library and you can call it directly (This is how you integrate e.g. assembler and C, or even C++, which you can get to generate straight C functions).
If the latter, you usually need to find an embeddable version of the language, and find out how to run a script from inside your application (This is how Lua is usually used, for example).
But are you sure you need the given Ruby libraries? Often, common libraries are implemented using a C or C++ library under the hood, and then just wrapped for scripting languages, so you can just skip the scripting translation layer and use the (maybe slightly more low-level) library yourself.
PS - there are also automatic wrapper generators, like SWIG, that will read a file in one language and write the translation code for you.

Using C programming language instead of Processing with Arduino

How can i use "Plain C" to program Arduino without using the "Processing programming language"?
I want to improve my C programming skills by using it with Arduino for embedded systems.
I have very limited ability to code in C++, and I would love to write my own Arduino Library using "C", to avoid using classes and OOP.
Processing looks like Java to me, and i want to use "C pointers" to gain more practical knowledge.
The thing is that Arduino doesn't use Processing.
Processing is (was?) a separate programming language which was developed independently for a different purpose. The language resembles C and C++ very closely, so closely that it's almost identical.
Programming the Arduino, however, is accomplished using a (n unfortunate) mixture of C and C++, with a set of custom libraries (which are similar in style to that of Processing). These libraries are written in C and C++ themselves, and they are only good for making one's code more portable across different MCU types. Using them is not strictly necessary for programming the AVR MCU. In fact, the libraries have quite a few drawbacks (the code is big, inherently slow and ugly, amongst others).
If you want to use plain ol' C for programming the Arduino, then just go ahead and do so. Grab the reference PDF from Atmel's site for your particular MCU, learn the special registers (I/O, timers, etc.), install the avr-gcc toolchain on your computer, and use avr-gcc, avr-objcopy and avrdude to compile and install your programs.
There also happens to be a C-only library which follows this kind of convention. It doesn't provide as much abstraction as the stock Arduino library has, because it's lower-level, but you can have a look at it and see how one can accomplish basic algorithms without the default libraries.

When is it appropriate to use C as object oriented language?

There are a lot of excellent answers how can one simulate object oriented concepts with C. To name a few:
C double linked list with abstract data type
C as an object oriented language
Can you write object-oriented code in C?
When is it appropriate to use such simulation and not to use languages that support object-oriented techniques natively?
Highly related:
Why artificially limit your code to C?
https://stackoverflow.com/questions/482574/whats-the-advantage-of-using-c-over-c-or-is-there-one
I'll give you the one reason I know of because it has been the case for me:
When you are developing software for a unique platform, and the only available compiler is a C compiler. This happens quite often in the world of embedded microcontrollers.
To just give you another example: a fair amount of the x86 Linux kernel is using C as if it were C++, when object-orientation seems natural (eg, in the VFS). The kernel is written in assembly and C (if that wasn't changed in the 3.0 kernel). The kernel coders create macros and structures, sometimes even named similar to C++ terms (eg, for_each_xxx), that allow them to code as-if. As others have pointed out, you'd never choose C if you start a heavily object-oriented program; but when you're adjusting C based code to add object-oriented features, you might.
When you want a cross-platform foundation for object-oriented APIs. A case in point is Apple's Core Foundation. Being entirely C, it could be easily ported, yet provides an extremely rich set of opaque objects to use.
A nice example of its flexibility is the way many of its types are 'toll-free' bridged with those from Foundation (a set of true OO Objective-C libraries). Many types from Core Foundation can be used, fairly naturally, in Foundation APIs, and vice-versa. It's hard to see this working so well without some OO concepts being present in the Core Foundation libraries.

typedef solving for dll wrapper

I want to write a wrap for a DLL file, in this case for python. The problem is that the argument types are not the C standard ones. They have been typedef'end to something else.
I have the header files for the DLL files... so I can manually track the original standard C type the argument type was typedef'ined to. But wanted a more systematic way to do this. I was wondering whether there is a utility that would evaluate the header files, or if you can get somewhere in the dll the types definition.
I think the tool you are looking for is SWIG:
SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. SWIG is used with different types of languages including common scripting languages such as Perl, PHP, Python, Tcl and Ruby. The list of supported languages also includes non-scripting languages such as C#, Common Lisp (CLISP, Allegro CL, CFFI, UFFI), Java, Lua, Modula-3, OCAML, Octave and R. Also several interpreted and compiled Scheme implementations (Guile, MzScheme, Chicken) are supported. SWIG is most commonly used to create high-level interpreted or compiled programming environments, user interfaces, and as a tool for testing and prototyping C/C++ software. SWIG can also export its parse tree in the form of XML and Lisp s-expressions. SWIG may be freely used, distributed, and modified for commercial and non-commercial use.
This does assume that you are willing to use the headers for the DLL. If you want to work solely with the DLL, then you have more work to do. It might provide a reflection interface that you can use to analyze the types. Failing that, you are into a world of pain - or reverse engineering any debugging information in the DLL.

Best Practice for Multi-programming-language Projects

Does anyone have any experience with doing this? I'm working on a Java decompiler right now in C++, but would like a higher level language to do the actual transformations of the internal trees. I'm curious if the overhead of marshaling data between languages is worth the benefit of a more expressive and language for better articulating what I'm trying to accomplish (like Haskell). Is this actually done in the "real world", or is it usually pick a language at the beginning of a project and stick with it? Any tips from those who have attempted it?
I'm a big advocate of always choosing the right programming language for each challenge. If there is another language which handles some otherwise tricky task easily, I'd say go for it.
Does it happen in the real world? Yes. I am currently working on a project which is made up of both PHP and objective-c code.
The trick is, as you pointed out, the communication between the two languages. If at all possible, let each language stick to its own domain, and have the two sections communicate in the simplest way possible. In my case, it was XML documents sent via http. In your case, some kind of formatted text file might be the answer.
Marshalling costs depend on the languages and architecture you're working with. For example, if you're on the CLR or JVM, there are low-cost interop solutions available - though I know you are working with probably unmanaged C++.
Another avenue is an embedded domain-specific language. Tree transformations are often expressible via pattern matching and application of a relatively small number of functions. You could consider writing a simple tree pattern-matcher - e.g. something that looks like Lisp s-exprs but uses placeholders to capture variables - with associated actions that are functions that transform the matched subtree.
John Ousterhout, the inventor of Tcl/Tk was a stong advocate of multi-language programming and wrote quite extensively about it. In order to do it, you need a clean interface mechanism between the languages you are using for it. There are quite a few mechanisms for this. Examples of different mechanisms for doing this are:
SWIG (Simplified Wrapper and
Interface Generator can take a c
or c++ (or several other languages)
header file and generate an
interface for a high level language
such as perl or python that allows
you to access the API. There are
other systems that use this
approach.
Java supports JNI, and various
other systems such as Python's
ctypes, VisualWorks DLL/C
connect are native mechanisms
that allow you to explicitly
construct the call to the lower
level subsystem.
Tcl/Tk was designed explicitly to be
embeded, and has a native API
for a C library to add hooks into
the language. The constructs for
this resemble argv[] structures in
C, and were designed to make it
relatively easy to interface a
command-line based C program into
Tcl. This is similar to the above
example, but coming from the opposite
direction. Many scripting languages
such as Python, Lua and Tcl support
this type of mechanism.
Explicit glue mechanisms such as
Pyrex, which are similar to a
wrapper generator, but have their
own language for defining the
interface. Pyrex is actually a
complete programming language.
Middleware such as COM or
CORBA allow a generic
interface definition to be built
externally to the application in an
interface definition language
and language bindings for the
languages concerned to use the
common interface mechanism.

Resources