Bootstrapping a language on LLVM

Bootstrapping a language on LLVM - c

I'm bootstrapping a programming language compiler on top of LLVM. Currently I'm mostly done writing a compiler for a subset of C which is self-compiling. When I'm finished with that, I'll bootstrap my language away from C, maintaining self-compilation as I go.
Since the compiler is self-compiling, any features of C that I use I will have to implement. So it's a constant balance: if I use too many features I will have to implement more than I want to, but if I don't implement enough features it will be difficult to write code.
One such feature is the LLVM bindings. Generating LLVM intermediate representation without the LLVM C bindings is difficult. However, if I us the LLVM bindings, I have to implement them again when I branch away from C.
I'm having some difficulty here, so I a looking for alternative solutions. Any ideas?

You could use the LLVM C bindings, but that requires your language understand enough C to do that.
Another alternative is to write out LLVM assembly language (a text file) and use llvm-as to turn that into bitcode.
Edit:
I re-read you question, I think you already understand the llvm-as vs. binding stuff.
Your language will probably want to be able to bind to C anyway for support libraries, etc. Use the C bindings for now and write your own bindings when you get further along.

A Strategy for using ANTLR + StringTemplate + LLVM
HTH

At some point, you're probably going to want to provide an API for wrapping C libraries as extension modules. LLVM may already support this (I know the parrot vm does). Why not use whatever extension system you use to wrap LLVM's own API? They may already support that, too. :)

Related

Is it practical to use #ifdef's to compile a library without certain features?

So I'm currently working with a proprietary programming language that is C-like. So while this question wasn't directly inspired by a C program, I think those of you who are familiar with C may be able to offer some good insight.
I'm currently working on a library. This library encompasses some basic features as well as features that require other libraries. I'm running into the problem of 'where do I draw the line for how many dependencies are included in this library?'.
So this seems like it could be a fairly common problem. What methods exist for addressing this issue?
Something I've had in mind. Implement #defines and #ifdefs that allow users to compile the library with only the features they want. So essentially, all of the library functions that require additional libraries would be wrapped in #ifdef guards. The user would be responsible for #define'ing the features they want. Essentially, this method would allow a user to still use parts of the library without needing to have other dependent libraries.
Your thoughts? Again, this is for a C-like language. Thus tools like CMake, etc. aren't available.

Yes. If you take a look at the Linux kernel for instance, it's done the exact same way.

Yes you can and You Should.
I use them whenever i feel the necessity. Also, i have been working on freeRTOS and that's how things are being done over there.

Template based C / C++ code generation

Any suggestion for template base code generator for C / C++ specifically to generate repetitive code generation? (Not UML / MATLAB model based or other advanced stuff). For a newbie in this field any good generic tutorial (not tool based)?
I came across GNU Autogen looks good but looks like it needs a steep learning curve. I would prefer some plug-in for eclipse like IDE, easy to use and most importantly good tutorials.

The basic concept of code generation is simple enough - and people's needs are varied enough - that there are quite a few options out there.
Boost.Preprocessor is a library of functions built on top of the standard C / C++ preprocessor that makes it much easier to use the preprocessor to do code generation. It's not as flexible as other options, and figuring out preprocessor errors can be tricky, but the fact that it uses only standard language features greatly simplifies using it and integrating it into your builds.
If you know Python, there's Cog.
Some of Google's projects use Pump.
There are many general-purpose templating solutions (Python's Genshi, eRuby, etc.). These are often designed for generating HTML and XML but also work for code.
It's also easy enough to hack something together in the scripting language of your choice.
Without knowing more about what your needs are and what tools you're comfortable with, I can't give a more specific recommendation.
I'm not familiar with anything that provides an Eclipse plugin.

If you know Python, then Cog could be considered as light-weight solution: http://www.python.org/about/success/cog/

Look at my answer for a similar question for Java classes using M2T-JET, an eclipse based, lightweight templating generator. JET is language agnostic and you can see from the example that it's fairly easy to use.

I appreciate using Lua for this task, with something like Templet or one of another myriad of Lua-based preprocessors. The benefit of using Lua over something like Python is that you can, if necessary, include the source code to your template processor and a basic Lua installation along with whatever it is you are shipping. You may then add the compilation of Lua and subsequent template files to the build process as usual.
I would advise not using Python-based solutions for one reason: juggling various pythons to satisfy every developer's use of a completely different yet incompatible version is annoying. If you choose to use a language which you can't embed in your trees, you'll want to make sure pre-computed versions are available.

Late to the party but I would recommend Codeworker Its the only tool I found that does everything the above tools do and more. It has the Python Cog like functionality for embedded generation, it has the template based generation like Templet or Pump. And it has the rather useful feature of protected areas so you can customise your code as needed and re-generate.
I have used it for generating all the boiler plate c++ code as well as configuration for projects like SQL, config, javascript etc.

Brief Explanation of C Supersets?

I'm getting more and more confused in regards to C's supersets the further I venture into the programming world. There's just so many versions.. C, C++, C#, Objective-C, Objective-C++ and God knows what else.
I only know tidbits about these languages (some are object-oriented, some are procedural, C was originally developed for UNIX, C++ started as an extension and is used primarily on the Windows OS, Objective-C is primarily used on Linux and Mac OS/iOS, etc), but I'm not even sure that what I know is correct.
I would just like someone to shed some light on what I "know" - a little bit more information about which are successive versions, which platforms each are generally used on, which are the best versions to learn, etc if anyone is feeling generous. :)
Update
Also, I hope to eventually start native (without needs of plugins, such as the .NET framework) application development for Windows and Mac, so can anyone confirm that I would need to learn C++ for Windows and Objective-C for Mac?

C++ started life as "C with classes", in which Bjarne Stroustrup working at AT&T sought to add features like methods and inheritance to C structures in a way that requires as little support from a runtime library as possible. The classes added to "C with classes" are very heavily influenced by the Simula language. It's since grown to include a number of other features, some of the important ones being generics, lambda functions and an expressive standard type library.
Objective-C started life not too dissimilar from its current state, except that it was originally implemented as a C preprocessor. Brad Cox, at his company Stepstone, wanted to combine the power of the Smalltalk object-oriented model with the performance of C native execution. Objective-C uses a dynamic message-passing system to dispatch calls to objects, a design that's directly opposed to C++'s goal of doing everything in the compiler. Thus while objc and c++ start from the same base, the results are very different.
Both of the above language authors also published books explaining the design intention behind their respective languages. Cox's book is long out of print, but both are worth reading if you get the chance. Stroustrup's publications list
Objective-C++ relies on the fact that GCC (or LLVM) can generate code in either language, so the compiler writers allow you to use features from both in the same source file. There are some limits on what's possible at the boundary layer, and Objective-C++ is mainly used either to adapt a C++ library to an Objective-C app or to use STL or Boost data types from C++ inside an Objective-C class.
Finally, languages like C# and Java are not C supersets or C descendants at all. They use C-like syntax to provide some familiarity (and perhaps to avoid having to think about designing a de novo language syntax) but are different beasts that work in a very different way.

You're mixing up the name of the language and what that was meant to imply.
C was first, procedural and low level non-generic (at least not type-safe) as you may know.
Then C++ originated as C with classes, and templates. Later this evolved into something that sets itself apart from C more and more, expanding its standard library in each step. C++ is a multi-aradigm language, meaning it has stuff for procedural, generic, functional (a little) and object-oriented programming. Don't forget about Boost as a powerful extension to the Standard library.
Objective-C is a C expanded with numerous runtime facilities (like messages) to ease object-oriented development. Objective-C++ is the same extensions applied to C++, staying compatible as much as possible with the original language.
C# is something entirely different. Think of it as a hybrid between Java and C++, although that will set bad blood with users of all three languages. It has a garbage collector.
(Objective-)C used primarily on Linux is not entirely correct. All of KDE is C++, for one. C++ primarily on Windows isn't entirely true, the OS API is all C. What Microsoft made of a C++ API (MFC/ATL) is a living hell. Microsoft is pushing C#, along with associated .Net stuff a lot. So think of C# as mostly Windows, although there is ongoing effort to create a cross-platform .Net alternative (mono). Apple's Foundation APIs are Objective-C, but the underlying OS APIs are all still C.
"Supersets" is a wrong terminology. C is not a subset of C++. Both share a common subset. Only Objective-C(++) is a pure superset of C(++).
As to the best versions to learn: the latest, obviously, perhaps limited to the compiler support on the various platforms you will write for.

What programming languages, without a runtime, besides C, are good for writing programming languages? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking into a hobby of writing a toy programming language, partly from minor annoyances with other languages, partly so that I can understand what it's like, but mostly just to fool around.
On the off chance that it gets really useful, I don't want it to depend on the run-time of another programming language for programs written in it to run. That is, I want the interpreter / compiler to itself be a program compiled natively into the target OS (language itself may be interpreted / provide a run-time).
Is there any alternative to doing this besides C? What are some advantages / disadvantages or using each?
Clarification 1: I am not intending to go low-level enough to write kernels, filesystems, device drivers, boot loaders. However I would like to be able to manage my own memory.
Clarification 2: Due to a terminology error / misunderstanding, and since I was so used to the C runtime running on various OS's, I said that C does not have a runtime / and or I am not interested in a runtime. A better way to say what I really want is that my programs compile natively into the target (desktop) OS without needing to install additional software from the bootstrapping language.
2.1: if I write the compiler/interpreter in python, I don't want the emitted executables to depend on the python program.
2.2: if I use a compiling step, for instance, to compile the programs using perl, I don't want the emitted executables to depend on a libperl.dll/so.
2.3: the exception is with runtimes is C since the C runtime is usually installed on almost all desktop OS's as many core OS tools depend on it.

You could use any language that has an existing compiler that emits native code without dependencies. C and C++ are pretty good bets because their runtimes are available pretty much everywhere (even more so in C).
One approach in your language build-up that could be well worth trying is this: make your compiler output C (or C++). Then you can use all the existing ecosystem around those languages and their runtimes (linkers, object dumpers, debuggers, etc...), even plan integration with existing code.
Those tools would be useful both for the users of your language, and obviously for yourself while you're experimenting on that toy language.
Once you get to the point where your language is "self-hosted" (i.e. your compiler is written in your own language), you'll be able to start thinking about doing away with the whole C part and write a native code compiler, with its runtime.
Good luck :-)
Also make sure you go look at LLVM. It's a "compiler infrastructure". That is probably the best place to start with these days to implement a new language. The documentation is pretty good, and the tutorials include building a toy language.

C has a runtime... C++ has a slightly bigger minimal run-time than C. Some implementations of Ada have pragmas allowing to check that some features mandating the use of a run-time aren't used (I wonder if they weren't standardized later, I've stopped to follow Ada standardization in the late 90's), making it perhaps having a minimal run-time of the same compexity as C.

PyPy usses RPython to implement python language. Will this work for you?

Haxe is written in OCaml, and I think this is a really good language for writing other languages.
http://haxe.org/
http://caml.inria.fr/

If your going to write bootloaders and kernels then C is your language, otherwise it doesn't matter what language you use to develop your language. Just because your host language has a runtime doesn't automatically mean your target language must have one.
But ofcourse toy languages need a runtime, for example a JVM/LLVM/.NET CLR. Or interpreters. If you don't go for these choices you need to generate machine code that conforms to a ABI, and that is very painfull.
I suggest you look target llvm and generate machine code from there, dalvik might also fit your needs (since it is extremely lightweight).

I think Python is best for you. Python is a programming language that you can work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs.
Python, recently, has been ported to the Java and .NET virtual machines.
The most important, Python is free.

Tips on wrapping a C library in Objective-C

I have a library written in C that I would like to use in an Objective-C app, either on the Mac or the iPhone.
Unfortunately, since this library is being written by individuals in the open source space, the documentation is quite sparse and incomplete. While I can figure out how to use the stuff in the library, I don't really have an overview of the entire code base.
What I would like to do is wrap the library up into some easily usable and transferrable classes in Objective-C.
Does anyone have any tips on how to approach this?
Any advice on the best way to get a visual hierarchy of how the library is structured?
How would I go about deciding how to best structure the wrapper for reusability and ease of use?
Any and all help will be greatly appreciated, thanks!

I've done this a few times myself. This can be fun -- it's your chance to fix (or at least hide) bad code!
You can use Doxygen to get a visual hierarchy of the code (although I've only used it for C++ libraries, it also works with C), or any of the other free tools out there.
Don't structure your wrapper class like the underlying library if the library isn't designed or documented well. This is your chance to consider the point of view of the user and how they are going to be using the code. Write your test cases first to figure that out, and/or talk to some people who use the library already.
Two nice design patterns that match up with what you're doing are Adapter and Facade.

First, remember: a C library is an Objective-C library. You don't actually need to do any wrapping at all, although you may want to if the library interface is especially cumbersome.
Second, if you decide that you want to write a library wrapper, keep it simple. Identify the core functions of the library that you actually plan to use, and think about how best to provide an interface to those functions and those functions only, with your intended usage in mind. Design an interface that you want to work with, then implement it over the library.

Since ARC (Automatic Reference Counting) was added to the Apple compilers and libraries, Objective-C and C are no longer so freely interchangeable. (Here's a list of ARC documentation and tutorials.) You need to consider the memory allocation issues much more thoroughly, and you might just want to "bridge" the libraries. See this SO question and some of the links from there, about how Apple bridges between Obj-C and C libraries.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight