I have a requirement for porting some existing C code to a IEC 61131-3 compliant PLC.
I have some options of splitting the code into discrete function blocks and weaving those blocks into a standard solution (Ladder, FB, Structured Text etc). But this would require carving up the C code in order to build each function block.
When looking at the IEC spec I realsied that the IEC Instruction List form could be a target language for a compiler. The wikepedia article lists two development tools:
CoDeSys
Beremiz
But these seem to be targeted compiling IEC languages to C, not C to IEC.
Another possible solution is to push the C code through a C to Pascal translator and use that as a starting point for a Structured Text solution.
If not any of these I will go down the route of splitting the code up into function blocks.
Edit
As prompted by mlieson's reply I should have mentioned that the C code is an existing real-time control system. So the programs algorithms should already suit a PLC environment.
Maybe this answer comes too late but it is possible to call C code from CoDeSys thanks to an external library.
You can find documentation on the CoDeSys forum at http://forum-en.3s-software.com/viewtopic.php?t=620
That would give you to use your C code into the PLC with minor modifcations. You'll just have to define the functions or function blocks interfaces.
My guess is that a C to Pascal translator will not get you near enough for being worth the trouble. Structured text looks a lot like Pascal, but there are differences that you will need to fix everywhere.
Not a bug issue, but don't forget that PLCs runtime enviroment is a bit different. A C applications starts at main() and ends when main() returns. A PLC calls it main() over and over again, 100:s of times per second and it never ends.
Usally lengthy calculations and I/O needs to be coded in diffent fashion than a C appliation would use.
Unless your C source is many many thousands lines of code - Rewrite it.
It is impossible. To be short: the IL language is a 4GL (i.e. limited to
the domain, as well as other IEC 61131-3 languages -- ST, FBD, LD, SFC).
The C language is a 3GL.
To understand the problem, try to answer the question, which way to
express in IL manipulations with a pointer? for example, to express call a
function by a pointer. What about interrupts? Low level access to the
peripherial devices?
(really, there are more problems)
BTW, there is the Reflex language, aka "C with processes". Reflex is a 4GL for the
control domain with C-like syntax. But the known translators produce
C-code and Python-code.
If the amount of code to convert is a few thousand lines, recoding by hand is probably your best bet.
If you have lots of code to convert, then an automated tool might be very effective.
Using the DMS Software Reengineering Toolkit we've built translators to map mechanical motion diagrams into RLL (PLC) code. DMS also has full C parser/analyzers/front ends. The pieces are there to build a C to RLL code.
This isn't an easy task. It likely takes 6-12 man-months to configure DMS to something resembling what you want. If that's less than what it takes to do by hand, then its the right way to do it.
There are a few IEC development environments and target hardware that can use C blocks... I would also take a look at the reasons why it HAS to be an IEC-61131 complaint target. I have written extensively on compliance and why it doesn't mean squat.
SOFTplc corp can help I'm sure with user defined loadable modules... and they can be in C..
Schneider also supports C function blocks...
Labview too!! not sure why IEC is important that's all!! the compiler if existed would create bad code for sure:)
Your best bet is to split your C code into smaller parts which can be recoded as PLC functional blocks and use C to PASCAL convertor for each block which you will rewrite in structured text. Prepare to do a lot of manual work since automated conversion will probably disappoint you.
Also take a look at this page: http://www.control.com/thread/1026228786
Every time I've done this, I just parsed and converted it by hand from C directly to ST. I only ran into a few functions that required complete rewrites, although there was very little that dealt with pointers, which is something that ST generally chokes on, unfortunately.
Using the existing C code as blocks that are called by the PLC program would have the added advantage that the C blocks could run at the same periodicity that they did before, and their function is likely already well documented and tested. This would minimize any effect on changes from the existing control system. This is an architecture for controls with software PLCs that I have seen used before.
Related
I am implementing a lisp interpreter in C, i have implemented along with few primitives like cons , car, cdr , eq, basic arithmetic stuff.
Just before i was starting to implement define and lambda it occurred to me that i need to implement an environment. I am unsure if i could implement it in lisp itself.
My intent is to implement minimal amount of lisp so that i could write extension to the language in itself. I am not sure how much is minimal, Would implementing FFI Qualify as minimal ?
The answer to your question depends on the meaning that you give to the word “minimal”.
Given your question, and assuming that you don't want to make an implementation competing with the nowdays fine implementations of Common Lisp and Schema, my hypothesis is that with “minimal” you intend: Turing complete, that is capable of expressing any computation expressible in a general purpose programming language.
With this assumption, you need to implement three other things:
conditional forms (cond)
lambda expressions (lambda)
a way of defining recursive lambda expression (labels or defun)
Your interpreter then should be able to evaluate forms. This should be sufficient to have a language equivalent to the initial LISP, that allow to express in the language any computable function.
First off, you are talking about first writing a LISP interpreter. You have a lot of choices to take when it comes to scoping, LISP1 vs LISP2 since these questions alter the implementation core. An interpreter is a general purpose program that reads and evaluates code. It can support abstractions but it won't extend itself by making more native stuff.
If you are interested in such stuff you can perhaps make a compiler instead. Eg. there are many Sceme like subsets that compiles to C or Java code, but you can make your own VM. Thus it can indeed compile itself to be run on it's own target machine (self hosting) if all the forms and procedures you use has been implemented using the primitives supported by the compiler.
Making a dumb compiler is not much difference from making an interpreter. That is very clear if yo've watched the SICP videos (10A is about compilation, 7A-B is about interpreters)
The environment can be a chain of pairs just as in a LISP interpreter. It would be difficult to implement the environment of itself in LISP without making it a very difficult Lisp language to use (unless it's compiled that is)
You may use the data structures of lisp and the primitives from the C code though.
Making a FFI is a fast way to give your language lots of features. It solves the chicken and egg problem by using other peoples work from within your language. In fuses the top (primitives and syntax) and the bottom layer (a runtime) of your system. It's the ultimate primitive and you can think of it as system call or message bus to the runtime.
I strongly suggest to read Queinnec's book: Lisp In Small Pieces. It is a book dedicated entirely to answer your question, and it explains in detail the many trade-offs and the internals of Lisp implementations and definitions, by giving many explained examples of Lisp interpreters and compilers.
You might also consider using libffi. You could be interested in the internals of M.Serrano's Bigloo & Hop implementations. You might even look inside my MELT lisp-like language to customize the GCC
compiler.
You also need to learn more about garbage collection (you might read the GC handbook). You could use Boehm's conservative Garbage Collector (or something else, e.g. my Qish or MPS) or write your own GC.
You may want to learn more about Chicken, Scheme 48, Guile and read their papers and look inside their code.
See also J.Pitrat's blog: it is not about Lisp (but about bootstrapping strong AI) and has several fascinating entries related to bootstrapping.
So let's say I have a string containing some code in C, predictably read from a file that has other things in it besides normal C code. How would I turn this string into code usable by the program? Do I have to write an entire interpreter, or is there a library that already does this for me? The code in question may call subroutines that I declared in my actual C file, so one that only accounts for stock C commands may not work.
Whoo. With C this is actually pretty hard.
You've basically got a couple of options:
interpret the code
To do this, you'll hae to write an interpreter, and interpreting C is a fairly hard problem. There have been C interpreters available in the past, but I haven't read about one recently. In any case, unless you reallY really need this, writing your own interpreter is a big project.
Googling does show a couple of open-source (partial) C interpreters, like picoc
compile and dynamically load
If you can capture the code and wrap it so it makes a syntactically complete C source file, then you can compile it into a C dynamically loadable library: a DLL in Windows, or a .so in more variants of UNIX. Then you could load the result at runtime.
Now, what normally would lead someone to do this is a need to be able to express some complicated scripting functions. Have you considered the possibility of using a different language? Python, Scheme (guile) and Lua are easily available to add as a scripting language to a C application.
C has nothing of this nature. That's because C is compiled, and the compiler needs to do a lot of building of the code before the code starts running (hence receives a string as input) that it can't really change on the fly that easily. Compiled languages have a rigidity to them while interpreted languages have a flexibility.
You're thinking of Perl, Python PHP etc. and so called "fourth generation languages." I'm sure there's a technical term in c.s. for this flexibility, but C doesn't have it. You'll need to switch to one of these languages (and give up performance) if you have a task that requires this sort of string use much. Check out Perl's /e flag with regexes, for instance.
In C, you'll need to design your application so you don't need to do this. This is generally quite doable, as for its non-OO-ness and other deficiencies many huge, complex applications run on well-written C just fine.
I've heard of the idea of bootstrapping a language, that is, writing a compiler/interpreter for the language in itself. I was wondering how this could be accomplished and looked around a bit, and saw someone say that it could only be done by either
writing an initial compiler in a different language.
hand-coding an initial compiler in Assembly, which seems like a special case of the first
To me, neither of these seem to actually be bootstrapping a language in the sense that they both require outside support. Is there a way to actually write a compiler in its own language?
Is there a way to actually write a compiler in its own language?
You have to have some existing language to write your new compiler in. If you were writing a new, say, C++ compiler, you would just write it in C++ and compile it with an existing compiler first. On the other hand, if you were creating a compiler for a new language, let's call it Yazzleof, you would need to write the new compiler in another language first. Generally, this would be another programming language, but it doesn't have to be. It can be assembly, or if necessary, machine code.
If you were going to bootstrap a compiler for Yazzleof, you generally wouldn't write a compiler for the full language initially. Instead you would write a compiler for Yazzle-lite, the smallest possible subset of the Yazzleof (well, a pretty small subset at least). Then in Yazzle-lite, you would write a compiler for the full language. (Obviously this can occur iteratively instead of in one jump.) Because Yazzle-lite is a proper subset of Yazzleof, you now have a compiler which can compile itself.
There is a really good writeup about bootstrapping a compiler from the lowest possible level (which on a modern machine is basically a hex editor), titled Bootstrapping a simple compiler from nothing. It can be found at https://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html.
The explanation you've read is correct. There's a discussion of this in Compilers: Principles, Techniques, and Tools (the Dragon Book):
Write a compiler C1 for language X in language Y
Use the compiler C1 to write compiler C2 for language X in language X
Now C2 is a fully self hosting environment.
The way I've heard of is to write an extremely limited compiler in another language, then use that to compile a more complicated version, written in the new language. This second version can then be used to compile itself, and the next version. Each time it is compiled the last version is used.
This is the definition of bootstrapping:
the process of a simple system activating a more complicated system that serves the same purpose.
EDIT: The Wikipedia article on compiler bootstrapping covers the concept better than me.
A super interesting discussion of this is in Unix co-creator Ken Thompson's Turing Award lecture.
He starts off with:
What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. In this ease, I will use a specific example from the C compiler.
and proceeds to show how he wrote a version of the Unix C compiler that would always allow him to log in without a password, because the C compiler would recognize the login program and add in special code.
The second pattern is aimed at the C compiler. The replacement code is a Stage I self-reproducing program that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.
Check out podcast Software Engineering Radio episode 61 (2007-07-06) which discusses GCC compiler internals, as well as the GCC bootstrapping process.
Donald E. Knuth actually built WEB by writing the compiler in it, and then hand-compiled it to assembly or machine code.
As I understand it, the first Lisp interpreter was bootstrapped by hand-compiling the constructor functions and the token reader. The rest of the interpreter was then read in from source.
You can check for yourself by reading the original McCarthy paper, Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I.
Every example of bootstrapping a language I can think of (C, PyPy) was done after there was a working compiler. You have to start somewhere, and reimplementing a language in itself requires writing a compiler in another language first.
How else would it work? I don't think it's even conceptually possible to do otherwise.
Another alternative is to create a bytecode machine for your language (or use an existing one if it's features aren't very unusual) and write a compiler to bytecode, either in the bytecode, or in your desired language using another intermediate - such as a parser toolkit which outputs the AST as XML, then compile the XML to bytecode using XSLT (or another pattern matching language and tree-based representation). It doesn't remove the dependency on another language, but could mean that more of the bootstrapping work ends up in the final system.
It's the computer science version of the chicken-and-egg paradox. I can't think of a way not to write the initial compiler in assembler or some other language. If it could have been done, I should Lisp could have done it.
Actually, I think Lisp almost qualifies. Check out its Wikipedia entry. According to the article, the Lisp eval function could be implemented on an IBM 704 in machine code, with a complete compiler (written in Lisp itself) coming into being in 1962 at MIT.
Some bootstrapped compilers or systems keep both the source form and the object form in their repository:
ocaml is a language which has both a bytecode interpreter (i.e. a compiler to Ocaml bytecode) and a native compiler (to x86-64 or ARM, etc... assembler). Its svn repository contains both the source code (files */*.{ml,mli}) and the bytecode (file boot/ocamlc) form of the compiler. So when you build it is first using its bytecode (of a previous version of the compiler) to compile itself. Later the freshly compiled bytecode is able to compile the native compiler. So Ocaml svn repository contains both *.ml[i] source files and the boot/ocamlc bytecode file.
The rust compiler downloads (using wget, so you need a working Internet connection) a previous version of its binary to compile itself.
MELT is a Lisp-like language to customize and extend GCC. It is translated to C++ code by a bootstrapped translator. The generated C++ code of the translator is distributed, so the svn repository contains both *.melt source files and melt/generated/*.cc "object" files of the translator.
J.Pitrat's CAIA artificial intelligence system is entirely self-generating. It is available as a collection of thousands of [A-Z]*.c generated files (also with a generated dx.h header file) with a collection of thousands of _[0-9]* data files.
Several Scheme compilers are also bootstrapped. Scheme48, Chicken Scheme, ...
All over the web, I am getting the feeling that writing a C backend for a compiler is not such a good idea anymore. GHC's C backend is not being actively developed anymore (this is my unsupported feeling). Compilers are targeting C-- or LLVM.
Normally, I would think that GCC is a good old mature compiler that does performs well at optimizing code, therefore compiling to C will use the maturity of GCC to yield better and faster code. Is this not true?
I realize that the question greatly depends on the nature of the language being compiled and on other factors such that getting more maintainable code. I am looking for a rather more general answer (w.r.t. the compiled language) that focuses solely on performance (disregarding code quality, ..etc.). I would be also really glad if the answer includes an explanation on why GHC is drifting away from C and why LLVM performs better as a backend (see this) or any other examples of compilers doing the same that I am not aware of.
Let me list my two biggest problems with compiling to C. If this is a problem for your language depends on what kind of features you have.
Garbage collection When you have garbage collection you may have to interrupt regular execution at just about any point in the program, and at this point you need to access all pointers that point into the heap. If you compile to C you have no idea where those pointers might be. C is responsible for local variables, arguments, etc. The pointers are probably on the stack (or maybe in other register windows on a SPARC), but there is no real access to the stack. And even if you scan the stack, which values are pointers? LLVM actually addresses this problem (thought I don't know how well since I've never used LLVM with GC).
Tail calls Many languages assume that tail calls work (i.e., that they don't grow the stack); Scheme mandates it, Haskell assumes it. This is not the case with C. Under certain circumstances you can convince some C compilers to do tail calls. But you want tail calls to be reliable, e.g., when tail calling an unknown function. There are clumsy workarounds, like trampolining, but nothing quite satisfactory.
While I'm not a compiler expert, I believe that it boils down to the fact that you lose something in translation to C as opposed to translating to e.g. LLVM's intermediate language.
If you think about the process of compiling to C, you create a compiler that translates to C code, then the C compiler translates to an intermediate representation (the in-memory AST), then translates that to machine code. The creators of the C compiler have probably spent a lot of time optimizing certain human-made patterns in the language, but you're not likely to be able to create a fancy enough compiler from a source language to C to emulate the way humans write code. There is a loss of fidelity going to C - the C compiler doesn't have any knowledge about your original code's structure. To get those optimizations, you're essentially back-fitting your compiler to try to generate C code that the C compiler knows how to optimize when it's building its AST. Messy.
If, however, you translate directly to LLVM's intermediate language, that's like compiling your code to a machine-independent high-level bytecode, which is akin to the C compiler giving you access to specify exactly what its AST should contain. Essentially, you cut out the middleman that parses the C code and go directly to the high-level representation, which preserves more of the characteristics of your code by requiring less translation.
Also related to performance, LLVM can do some really tricky stuff for dynamic languages like generating binary code at runtime. This is the "cool" part of just-in-time compilation: it is writing binary code to be executed at runtime, instead of being stuck with what was created at compile time.
Part of the reason for GHC's moving away from the old C backend was that the code produced by GHC was not the code gcc could particularly well optimise. So with GHC's native code generator getting better, there was less return for a lot of work. As of 6.12, the NCG's code was only slower than the C compiled code in very few cases, so with the NCG getting even better in ghc-7, there was no sufficient incentive to keep the gcc backend alive. LLVM is a better target because it's more modular, and one can do many optimisations on its intermediate representation before passing the result to it.
On the other hand, last I looked, JHC still produced C and the final binary from that, typically (exclusively?) by gcc. And JHC's binaries tend to be quite fast.
So if you can produce code the C compiler handles well, that is still a good option, but it's probably not worth jumping through too many hoops to produce good C if you can easier produce good executables via another route.
One point that hasn't been brought up yet is, how close is your language to C? If you're compiling a fairly low-level imperative language, C's semantics may map very closely to the language you're implementing. If that's the case, it's probably a win, because the code written in your language is likely to resemble the kind of code someone would write in C by hand. That was definitely not the case with Haskell's C backend, which is one reason why the C backend optimized so poorly.
Another point against using a C backend is that C's semantics are actually not as simple as they look. If your language diverges significantly from C, using a C backend means you're going to have to keep track of all those infuriating complexities, and possibly differences between C compilers as well. It may be easier to use LLVM, with its simpler semantics, or devise your own backend, than keep track of all that.
Aside form all the codegenerator quality reasons, there are also other problems:
The free C compilers (gcc, clang) are a bit Unix centric
Support more than one compiler (e.g. gcc on Unix and MSVC on Windows) requires duplication of effort.
compilers might drag in runtime libraries (or even *nix emulations) on Windows that are painful. Two different C runtimes (e.g. linux libc and msvcrt) to base on complicate your own runtime and its maintenance
You get a big externally versioned blob in your project, which means a major version transition (e.g. a change of mangling could hurts your runtime lib, ABI changes like change of alignment) might require quite some work. Note that this goes for compiler AND externally versioned (parts of the) runtime library. And multiple compilers multiply this. This is not so bad for C as backend though as in the case where you directly connect to (read: bet on) a backend, like being a gcc/llvm frontend.
In many languages that follow this path, you see Cisms trickle through into the main language. Of course this won't happy to you, but you will be tempted :-)
Language functionality that doesn't directly map to standard C (like nested procedures,
and other things that need stack fiddling) are difficult.
If something is wrong, users will be confronted with C level compiler or linker errors that are outside their field of experience. Parsing them and making them your own is painful, specially with multiple compilers and -versions
Note that point 4 also means that you will have to invest time to just keep things working when the external projects evolve. That is time that generally doesn't really go into your project, and since the project is more dynamic, multiplatform releases will need a lot of extra release engineering to cater for change.
So in short, from what I've seen, while such a move allows a swift start (getting a reasonable codegenerator for free for many architectures), there are downsides. Most of them are related to loss of control and poor Windows support of *nix centric projects like gcc. (LLVM is too new to say much on long term, but their rhetoric sounds a lot like gcc did ten years ago). If a project you are hugely dependent on keeps a certain course (like GCC going to win64 awfully slow), then you are stuck with it.
First, decide if you want to have serious non *nix ( OS X being more unixy) support, or only a Linux compiler with a mingw stopgap for Windows? A lot of compilers need first rate Windows support.
Second, how finished must the product become? What's the primary audience ? Is it a tool for the open source developer that can handle a DIY toolchain, or do you want to target a beginner market (like many 3rd party products, e.g. RealBasic)?
Or do you really want to provide a well rounded product for professionals with deep integration and complete toolchains?
All three are valid directions for a compiler project. Ask yourself what your primary direction is, and don't assume that more options will be available in time. E.g. evaluate where projects are now that chose to be a GCC frontend in the early nineties.
Essentially the unix way is to go wide (maximize platforms)
The complete suites (like VS and Delphi, the latter which recently also started to support OS X and has supported linux in the past) go deep and try maximize productivity. (support specially the windows platform nearly completely with deep levels of integration)
The 3rd party projects are less clear cut. They go more after self-employed programmers, and niche shops. They have less developer resources, but manage and focus them better.
As you mentioned, whether C is a good target language depends very much on your source language. So here's a few reasons where C has disadvantages compared to LLVM or a custom target language:
Garbage Collection: A language that wants to support efficient garbage collection needs to know extra information that interferes with C. If an allocation fails, the GC needs to find which values on the stack and in registers are pointers and which aren't. Since the register allocator is not under our control we need to use more expensive techniques such as writing all pointers to a separate stack. This is just one of many issues when trying to support modern GC on top of C. (Note that LLVM also still has some issues in that area, but I hear it's being worked on.)
Feature mapping & Language-specific optimisations: Some languages rely on certain optimisations, e.g., Scheme relies on tail-call optimisation. Modern C compilers can do this but are not guaranteed to do this which could cause problems if a program relies on this for correctness. Another feature that could be difficult to support on top of C is co-routines.
Most dynamically typed languages also cannot be optimised well by C-compilers. For example, Cython compiles Python to C, but the generated C uses calls to many generic functions which are unlikely to be optimised well even by latest GCC versions. Just-in-time compilation ala PyPy/LuaJIT/TraceMonkey/V8 are much more suited to give good performance for dynamic languages (at the cost of much higher implementation effort).
Development Experience: Having an interpreter or JIT can also give you a much more convenient experience for developers -- generating C code, then compiling it and linking it, will certainly be slower and less convenient.
That said, I still think it's a reasonable choice to use C as a compilation target for prototyping new languages. Given that LLVM was explicitly designed as a compiler backend, I would only consider C if there are good reasons not to use LLVM. If the source-level language is very high-level, though, you most likely need an earlier higher-level optimisation pass as LLVM is indeed very low-level (e.g., GHC performs most of its interesting optimisations before generating calling into LLVM). Oh, and if you're prototyping a language, using an interpreter is probably easiest -- just try to avoid features that rely too much on being implemented by an interpreter.
Personally I would compile to C. That way you have a universal intermediary language and don't need to be concerned about whether your compiler supports every platform out there. Using LLVM might get some performance gains (although I would argue the same could probably be achieved by tweaking your C code generation to be more optimizer-friendly), but it will lock you in to only supporting targets LLVM supports, and having to wait for LLVM to add a target when you want to support something new, old, different, or obscure.
As far as I know, C can't query or manipulate processor flags.
This answer is a rebuttal to some of the points made against C as a target language.
Tail call optimizations
Any function that can be tail call optimized is actually equivalent to an iteration (it's an iterative process, in SICP terminology). Additionally, many recursive functions can and should be made tail recursive, for performance reasons, by using accumulators etc.
Thus, in order for your language to guarantee tail call optimization, you would have to detect it and simply not map those functions to regular C functions - but instead create iterations from them.
Garbage collection
It can be actually implemented in C. You can create a run-time system for your language which consists of some basic abstractions over the C memory model - using for example your own memory allocators, constructors, special pointers for objects in the source language, etc.
For example instead of employing regular C pointers for the objects in the source language, a special structure could be created, over which a garbage collection algorithm could be implemented. The objects in your language (more accurately, references) - could behave just like in Java, but in C they could be represented along with meta-information (which you wouldn't have in case you were working just with pointers).
Of course, such a system could have problems integrating with existing C tooling - depends on your implementation and trade-offs that you're willing to make.
Lacking operations
hippietrail noted that C lacks rotate operators (by which I assume he meant circular shift) that are supported by processors. If such operations are available in the instruction set, then they can be added using inline assembly.
The frontend would in this case have to detect the architecture which it's running for and provide the proper snippets. Some kind of a fallback in the form of a regular function should also be provided.
This answer seems to be addressing some core issues seriously. I'd like to see some more substantiation on which problems exactly are caused by C's semantics.
There's a particular case where if you're writing a programming language with strong security* or reliability requirements.
For one, it would take you years to know a big enough subset of C well enough that you know all the C operations you will choose to employ in your compilation are safe and don't evoke undefined behaviour. Secondly, you'd then have to find an implementation of C that you can trust (which would mean a tiny trusted code base, and probably wont be very efficient). Not to mention you'll need to find a trusted linker, OS capable of executing compiled C code, and some basic libraries, all of which would need to be well-defined and trusted.
So in this case you might as well either use assembly language, if you care about about machine independence, some intermediate representation.
*please note that "strong security" here is not related at all to what banks and IT businesses claim to have
Is it a good idea to compile a language to C?
No.
...which begs one obvious question: why do some still think compiling via C is a good idea?
Two big arguments in favour of misusing C in this fashion is that it's stable and standardised:
For GCC, it's C or bust (but there is work underway which may allow other options).
For LLVM, there's the routine breaking of backwards-compatibility in its IR and APIs - what would you prefer: spending time on improving your project or chasing after LLVM?
Providing little more than a promise of stability is somewhat ironic, considering the intended purpose of LLVM.
For these and other reasons, there are various half-done, toy-like, lab-experiment, single-site/use, and otherwise-ignominious via-C backends scattered throughout cyberspace - being abandoned, most have succumbed to bit-rot. But there are some projects which do manage to progress to the mainstream, and their success is then used by via-C supporters to further perpetuate this fantasy.
But if you're one of those supporters, feel free to make fantasy into reality - there's that work happening in GCC, or the resurrected LLVM backend for C. Just imagine it: two well-built, well-maintained via-C backends into which the sum of all prior knowledge can be directed.
They just need you.
I have been doing OOP (C++/Java/PHP/Ruby) for a long time and really have a hard time imagining how large programs and libraries such as Linux or Apache can be written entirely in an imperative style. What would be small open source C projects I could look at to get a feel of how things are done in C?
Bonus points if the project is hosted on GitHub.
Things are done exactly the same way in C, but with less overt support from the language. Instead of creating a class to encapsulate some state, you create a struct. Instead of creating class members, with implicit this parameters, you create functions that you explicitly pass a struct* as the first parameter, that then operate on the struct.
To ensure that encapsulation is not broken you can declare the struct in a header, but only define it in the .c file where it is used. Virtual functions require more work - but again, its just a case of putting function pointers in the struct. Which is actually more convenient in C than C++ because in C you get to fill in your vtables manually, getting quite a fine level of control over which part of code implements part of what COM interface (if you are into COM in C of course).
You might find the ccan (Comprehensive C Archive Network, modeled after Perl's CPAN) interesting.
It's small at the moment, but the contributions are of high quality. Many of the contributions are by linux kernel developers.
Almost everything in there falls into the "few thousand LOC" or less category, too.
If you want a small example to start with, try looking at the source for the basic Linux CLI utilities. GNU binutils, make, or any of the other GNU utilities have full source code available and are relatively small code bases (some are larger than others). The easiest thing is usually to start with a utility that you have used before and are already familiar with.
Look at GLib for an almost canonical example of how to do object oriented programming in C.