Emulating lambdas in C? - c

I should mention that I'm generating code in C, as opposed to doing this manually. I say this because it doesn't matter too much if there's a lot of code behind it, because the compiler should manage it all. Anyway, how would I go around emulating a lambda in C? I was thinking I could just generate a function with some random name somewhere in the source code and then call that? I'm not too sure. I haven't really tried anything just yet, since I wanted to get the idea down before I implement it.
Is there some kind of preprocessor directive I can do, or some macro that will make this cleaner to do? I've been inspired by Jon Blow to try out compiler development, and he seemed to implement Lambdas in his language Jai. However, I think he does something where he generates bytecode, and then into C? I'm not sure.
Edit:
I'm working on a compiler, the compiler is just a project of mine to keep me busy, plus I wanted to learn more about compilers. I primarily use clang, I'm on Ubuntu 14.10. I don't have any garbage collection, but I wanted to try my hand at some kind of smart pointer-y/rust/ARC inspired memory model for garbage collection, i.e. little to no overhead. I chose C because I wanted to dabble in it more. My project is free software, just a hobby project.

There are several ways of doing it ("having" lambdas in C). The important thing to understand is that lambdas give closures and that closures are mixing "code" with "data" (the closed values); notice that objects are also mixing "code" with "data" and there is a similarity between objects and closures. See also this answer on Programmers.
Traditionally, in C, you not only use function pointers, but you adopt a convention regarding callbacks. This for instance is the case with GTK: every time you pass a function pointer, you also pass some data with it. You can view callbacks (the convention of giving C function pointer with some void*data) as a way to implement closures.
Since you generate C code (which is a wise idea, I'm doing similar things in MELT which -on Linux- generates C++ code at runtime, compile it into a shared object, and dlopen-s that) you could adopt a callback convention and pass some closed values to every function that you generate.
You might also consider closed values as static variables, but this approach is generally unwise.
There have been in the past some lambda.h header library which generates a machine-specific trampoline code for closures (essentially generating a code which pushes some closed values as arguments then call some routine). You might use some JIT compilation techniques (using libjit, GNU lightning, LLVM, asmjit, ....) to do the same. See also libffi to call an arbitrary function (of signature known at runtime only).
Notice that there is a strong -but indirect- relation between closures and garbage collection (read the GC handbook for more), and it is not by accident that every functional language has a GC. C++11 lambda functions are an exception on this (and it is difficult to understand all the intricacies of memory management of C++11 closures). So if you are generating C code, you could and probably should use Boehm's conservative garbage collector (which is wrapping dlopen) and you would have closure GC-ed values. (You could use some other GC libraries, e.g. Ravenbrook's MPS or my unmaintained Qish...) Then you could have the convention that every generated C function takes its closure as first argument.
I would suggest to read Scott's book on Programming Language Pragmatics and (assuming you know a tiny bit of Scheme or Lisp; if you don't you should learn a bit of Scheme and read SICP) Queinnec's book Lisp In Small Pieces (if you happen to read French, read the latest French variant).

Related

How much lisp to implement in C before writing extension in itself?

I am implementing a lisp interpreter in C, i have implemented along with few primitives like cons , car, cdr , eq, basic arithmetic stuff.
Just before i was starting to implement define and lambda it occurred to me that i need to implement an environment. I am unsure if i could implement it in lisp itself.
My intent is to implement minimal amount of lisp so that i could write extension to the language in itself. I am not sure how much is minimal, Would implementing FFI Qualify as minimal ?
The answer to your question depends on the meaning that you give to the word “minimal”.
Given your question, and assuming that you don't want to make an implementation competing with the nowdays fine implementations of Common Lisp and Schema, my hypothesis is that with “minimal” you intend: Turing complete, that is capable of expressing any computation expressible in a general purpose programming language.
With this assumption, you need to implement three other things:
conditional forms (cond)
lambda expressions (lambda)
a way of defining recursive lambda expression (labels or defun)
Your interpreter then should be able to evaluate forms. This should be sufficient to have a language equivalent to the initial LISP, that allow to express in the language any computable function.
First off, you are talking about first writing a LISP interpreter. You have a lot of choices to take when it comes to scoping, LISP1 vs LISP2 since these questions alter the implementation core. An interpreter is a general purpose program that reads and evaluates code. It can support abstractions but it won't extend itself by making more native stuff.
If you are interested in such stuff you can perhaps make a compiler instead. Eg. there are many Sceme like subsets that compiles to C or Java code, but you can make your own VM. Thus it can indeed compile itself to be run on it's own target machine (self hosting) if all the forms and procedures you use has been implemented using the primitives supported by the compiler.
Making a dumb compiler is not much difference from making an interpreter. That is very clear if yo've watched the SICP videos (10A is about compilation, 7A-B is about interpreters)
The environment can be a chain of pairs just as in a LISP interpreter. It would be difficult to implement the environment of itself in LISP without making it a very difficult Lisp language to use (unless it's compiled that is)
You may use the data structures of lisp and the primitives from the C code though.
Making a FFI is a fast way to give your language lots of features. It solves the chicken and egg problem by using other peoples work from within your language. In fuses the top (primitives and syntax) and the bottom layer (a runtime) of your system. It's the ultimate primitive and you can think of it as system call or message bus to the runtime.
I strongly suggest to read Queinnec's book: Lisp In Small Pieces. It is a book dedicated entirely to answer your question, and it explains in detail the many trade-offs and the internals of Lisp implementations and definitions, by giving many explained examples of Lisp interpreters and compilers.
You might also consider using libffi. You could be interested in the internals of M.Serrano's Bigloo & Hop implementations. You might even look inside my MELT lisp-like language to customize the GCC
compiler.
You also need to learn more about garbage collection (you might read the GC handbook). You could use Boehm's conservative Garbage Collector (or something else, e.g. my Qish or MPS) or write your own GC.
You may want to learn more about Chicken, Scheme 48, Guile and read their papers and look inside their code.
See also J.Pitrat's blog: it is not about Lisp (but about bootstrapping strong AI) and has several fascinating entries related to bootstrapping.

What 'mark' does a language leaves on a library that we need language bindings?

What mark does a language leaves on a compiled library that we need language bindings if we have call its functions from a different language?
object code looks 'language free' to me.
While learning OpenGL in c in Linux environment I have across language bindings.
Binding provides a simple and consistent way for applications to present and interact with data.
Source: The tag under your question
I'm guessing that you're either young or haven't been programming for more than a decade or so.
Object code should look language free, but it ain't due to history. Back in the 1970s and 1980s, on Intel 80x86 and Motorola 680x0 CPUs, function call arguments were passed on the stack. In the 'Pascal' convention, the number of arguments was fixed and the called function code removed them from the stack before returning. In the 'C' convention, the number of arguments was variable (eg printf) so the calling code had to remove them when the function returned. This cost 2 extra bytes per function call, which is nothing today but was significant back then when PCs only came with 128K or so of RAM. So Microsoft chose to use the Pascal calling convention for the Windows API, even though it was written in C. If your object code called a Windows function with the C convention by mistake, kaboom. This is why the header files are still cluttered up with WINAPI and _stdcall and _fastcall and whatnot.
Starting in the 1990s operating system authors realized this was silly and started imposing standard calling conventions on everyone. The C convention could handle both cases, so it got used everywhere. With the moves to MacOS X, 64 bit Windows, and ARM; we are finally getting language free object code.
Now, OpenGL was designed to be used from C and Fortran. (Which was in the 1990s still an important language for scientific calculations and visualization.) Both languages have integers, floating point numbers, and arrays of various sized ints/floats. C has structs but Fortran doesn't, and I suspect this is a major reason why the OpenGL API never uses any structs. There are also differences in the memory layout of 2D or higher dimension arrays between C and Fortran, and again note that the OpenGL API never specifies 2D arrays, only 1D.
A C API works for most languages. This is partly because C is 'portable assembler' that works onto almost any CPU and operating system. It's also because most other programming languages in common use are either supersets of C (C++, Objective-C) or implemented in C themselves (Python, Perl, Ruby) so can be made to call the OpenGL C API reasonably easily.
Java and C# have more problems, because they define their own object code, so to speak, and memory access is more tightly controlled. The C/OpenGL notion of 'here is a pointer to a block of memory, do what you like with it' breaks the security model of the JVM/CLR. So you end up having to use Java NIOByteBuffer things instead of just passing arrays.
A lot of it also comes down to the skill of the language binding designer. For one example, Python-OpenGL by Mike Fletcher is a really good binding. All the functions and constants have exactly the same names, so a lot of code can be just copied from C and pasted into Python. Python doesn't have C style arrays directly, but the language binding will silently translate any Python sequences/tuples you pass as "arrays" into the underlying C format for you. It feels natural for a Python programmer and still exposes the full capabilities of OpenGL.
For a bad example, JOGL is a pain in the arse. There's no automatic conversion from Java arrays to C, so you have to futz around with NIOByteBuffers yourself. It's so annoying that it's actually easier to use glBegin..glEnd blocks. And extra array offset arguments got added to a lot of OpenGL functions, so your code no longer looks the same as C/C++ and you waste a lot of time sticking ,0 on the end of function calls. Some of this is due to the JVM as mentioned before, but a lot of it is just bad design by (I suspect) somebody who never actually wrote much OpenGL themselves.
A long and rambling answer to a vague question.
Well, all you have to do is think about the myriad of calling conventions in C and C++. In order to prevent serious mishaps, the compiler mangles the function names based on calling convention so that you do not accidentally call a stdcall function using fastcall conventions. Each language has its own set of superfluous details like this that a language independent API should never have to burden itself with. Language bindings serve as an adapter/bridge that separates the language-specific stuff from the standardized API, filling in the gaps wherever necessary.
The OpenGL API is generally implemented in a single language (C) and programs written in other languages interface with the system's implementation through language bindings. OpenGL uses null-terminated ASCII strings for GLSL and has numerous functions that use pointers, things that make perfect sense for an API that is designed to be implemented in C. In Java, strings are not null-terminated and they are UTF-16 encoded; you can see why a bridge is needed. The Java GL bindings take care of string conversion and alter glVertexPointer (...)-like functions to fit Java's conditions for "pointing to" contiguous blocks of memory.

What's the purpose of using assembly language inside a C program?

What's the purpose of using assembly language inside a C program? Compilers are able to generate assembly language already. In what cases would it be better to write assembly than C? Is performance a consideration?
In addition to what everyone said: not all CPU features are exposed to C. Sometimes, especially in driver and operating system programming, one needs to explicitly work with special registers and/or commands that are not otherwise available.
Also vector extensions.
That was especially true before the advent of compiler intrinsics. Those alleviate the need for inline assembly somewhat.
One more use case for inline assembly has to do with interfacing C with reflected languages. Specifically, assembly is all but necessary if you need to call a function when its prototype is not known at compile time. In other words, when the quantity and datatypes of that function's arguments are but runtime variables. C variadic functions and the stdarg machinery won't help you in this case - they would help you parse a stack frame, but not build one. In assembly, on the other hand, it's quite doable.
This is not an OS/driver scenario. There are at least two technologies out there - Java's JNI and COM Automation - where this is a must. In case of Automation, I'm talking about the way the COM runtime is marshaling dual interfaces using their type libraries.
I can think of a very crude C alternative to assembly for that, but it'd be ugly as sin. Slightly less ugly in C++ with templates.
Yet another use case: crash/run-time error reporting. For postmortem debugging, you'd want to capture as much of program state at the point of crash as possible (i. e. all the CPU registers), and assembly is a much better vehicle for that than C. Postmortem debugging of crashing native code usually involves staring at the assembly anyway.
Yet another use case - code that is intended for execution in another process without that process' co-operation or knowledge. This is often referred to as "shellcode", but it doesn't have to be shell related. Code like that needs to be very carefully written, and it can't rely on the conveniences of a high level language (like the run time library, or having a data section) that are normally taken for granted. When one is after injecting a significant piece of functionality into a target process, they usually end up loading a dynamic library, but the initial trampoline code that loads the library and passes control to it tends to be in assembly.
I've been only covering cases where assembly is necessary. Hand-optimizing for performance is covered in other answers.
There are a few, although not many, cases where hand-optimized assembly language can be made to run more efficiently than assembly language generated by C compilers from C source code. Also, for developers used to assembly language, some things can just seem easier to write in assembler.
For these cases, many C compilers allow inline assembly.
However, this is becoming increasingly rare as C compilers get better and better and producing efficient code, and most platforms put restrictions on some of the low-level type of software that is often the type of software that benefits most from being written in assembler.
In general, it is performance but performance of a very specific kind. For example, the SIMD parallel instructions of a processor might not generated by the compiler. By utilizing processor specific data formats and then issuing processor specific parallel instructions (e.g. ARM NEON or Intel SSE), very fast performance on graphics or signal processing problems can occur. Even then, some compilers allow these to be expressed in C using intrinsic functions.
While it used to be common to use assembly language inserts to hand-optimize critical functions, those days are largely done. Modern compilers are very good and modern processors have very complicated timing requirements so hand optimized code is often less optimal than expected.
There were various reasons to write inline assemblies in C. We can simply categorize the reasons into necessary and unnecessary.
For the reasons of unnecessary, possibly be:
platform compatibility
performance concerning
code optimization
etc.
I consider above as unnecessary because sometime they can be discard or implemented through pure C. For example of platform compatibility, you can totally implement particular version for each platform, however, use inline assemblies might reduce the effort. Here we are not going to talk too much about the unnecessary reasons.
For necessary reasons, they possibly be:
something with standard libraries was insufficient to do
some instruction set was not supported by compilers
object code generated incorrectly
writing stack-sensitive code
etc.
These reasons considered necessary, because of they are almost not possibly done with pure C language. For example, in old DOSes, software interrupt INT21 was not reentrantable. If you want to write a Virtual Dirve fully use INT21 supported by the compiler, it was impossible to do. In this situation, you would need to hook the original INT21, and make it reentrantable. However, the compiled code wraps your every call with prolog/epilog. Thus, you can never break something restricted, or you just crashed the code. You can try any of trick by using the pure language of C with libraries; but even you can successfully find a trick, that would mean you found a particular order that the compiler generates the machine code; this is implying: you tried to let the compiler compiles your code to exactly machine code. So, why not just write inline assemblies directly?
This example explained all above of necessary reasons except instruction set not supported, but I think that was easy to think about.
In fact, there're more reasons to write inline assemblies, but now you have some ideas of them, and so on.
Just as a curiosity, I'm adding here a concrete example of something not-so-low-level you can only do in assembly. I read this in an assembly book from my university time where it was used to show an inherent limitation of C/C++, and how to overcome it with assembly.
The problem is how do I invoke a function when the exact number of parameters is only known at runtime? In fact, in C/C++ you can easily define a function that takes a variable number of arguments like printf. But when it comes to calling that function, the compiler wants to know exactly how many parameters must be passed. You may pass more paremters than required, that won't do any harm. But what if the number grows unexpectedly to 100 or 1000 parameters, and must be picked out of an array?
The solution of course is using assembly, where you can dynamically create a stack frame of the proper size, copy the parameters on the stack, invoke the function, and finally reset the stack.
In practice, this would hardly ever be a limitation (except if the library you're using is really really bad designed). People who use assembly in C have much better reasons to do so like others have pointed out in their answers. Still, I think may be an interesting fact to know.
I would rather think of that as a way to write a very specific code for a specific platform, optimization, though still common, is used less nowadays. Knowledge and usage of assembly in C is also practiced by all-color hats.

Is it a good idea to compile a language to C?

All over the web, I am getting the feeling that writing a C backend for a compiler is not such a good idea anymore. GHC's C backend is not being actively developed anymore (this is my unsupported feeling). Compilers are targeting C-- or LLVM.
Normally, I would think that GCC is a good old mature compiler that does performs well at optimizing code, therefore compiling to C will use the maturity of GCC to yield better and faster code. Is this not true?
I realize that the question greatly depends on the nature of the language being compiled and on other factors such that getting more maintainable code. I am looking for a rather more general answer (w.r.t. the compiled language) that focuses solely on performance (disregarding code quality, ..etc.). I would be also really glad if the answer includes an explanation on why GHC is drifting away from C and why LLVM performs better as a backend (see this) or any other examples of compilers doing the same that I am not aware of.
Let me list my two biggest problems with compiling to C. If this is a problem for your language depends on what kind of features you have.
Garbage collection When you have garbage collection you may have to interrupt regular execution at just about any point in the program, and at this point you need to access all pointers that point into the heap. If you compile to C you have no idea where those pointers might be. C is responsible for local variables, arguments, etc. The pointers are probably on the stack (or maybe in other register windows on a SPARC), but there is no real access to the stack. And even if you scan the stack, which values are pointers? LLVM actually addresses this problem (thought I don't know how well since I've never used LLVM with GC).
Tail calls Many languages assume that tail calls work (i.e., that they don't grow the stack); Scheme mandates it, Haskell assumes it. This is not the case with C. Under certain circumstances you can convince some C compilers to do tail calls. But you want tail calls to be reliable, e.g., when tail calling an unknown function. There are clumsy workarounds, like trampolining, but nothing quite satisfactory.
While I'm not a compiler expert, I believe that it boils down to the fact that you lose something in translation to C as opposed to translating to e.g. LLVM's intermediate language.
If you think about the process of compiling to C, you create a compiler that translates to C code, then the C compiler translates to an intermediate representation (the in-memory AST), then translates that to machine code. The creators of the C compiler have probably spent a lot of time optimizing certain human-made patterns in the language, but you're not likely to be able to create a fancy enough compiler from a source language to C to emulate the way humans write code. There is a loss of fidelity going to C - the C compiler doesn't have any knowledge about your original code's structure. To get those optimizations, you're essentially back-fitting your compiler to try to generate C code that the C compiler knows how to optimize when it's building its AST. Messy.
If, however, you translate directly to LLVM's intermediate language, that's like compiling your code to a machine-independent high-level bytecode, which is akin to the C compiler giving you access to specify exactly what its AST should contain. Essentially, you cut out the middleman that parses the C code and go directly to the high-level representation, which preserves more of the characteristics of your code by requiring less translation.
Also related to performance, LLVM can do some really tricky stuff for dynamic languages like generating binary code at runtime. This is the "cool" part of just-in-time compilation: it is writing binary code to be executed at runtime, instead of being stuck with what was created at compile time.
Part of the reason for GHC's moving away from the old C backend was that the code produced by GHC was not the code gcc could particularly well optimise. So with GHC's native code generator getting better, there was less return for a lot of work. As of 6.12, the NCG's code was only slower than the C compiled code in very few cases, so with the NCG getting even better in ghc-7, there was no sufficient incentive to keep the gcc backend alive. LLVM is a better target because it's more modular, and one can do many optimisations on its intermediate representation before passing the result to it.
On the other hand, last I looked, JHC still produced C and the final binary from that, typically (exclusively?) by gcc. And JHC's binaries tend to be quite fast.
So if you can produce code the C compiler handles well, that is still a good option, but it's probably not worth jumping through too many hoops to produce good C if you can easier produce good executables via another route.
One point that hasn't been brought up yet is, how close is your language to C? If you're compiling a fairly low-level imperative language, C's semantics may map very closely to the language you're implementing. If that's the case, it's probably a win, because the code written in your language is likely to resemble the kind of code someone would write in C by hand. That was definitely not the case with Haskell's C backend, which is one reason why the C backend optimized so poorly.
Another point against using a C backend is that C's semantics are actually not as simple as they look. If your language diverges significantly from C, using a C backend means you're going to have to keep track of all those infuriating complexities, and possibly differences between C compilers as well. It may be easier to use LLVM, with its simpler semantics, or devise your own backend, than keep track of all that.
Aside form all the codegenerator quality reasons, there are also other problems:
The free C compilers (gcc, clang) are a bit Unix centric
Support more than one compiler (e.g. gcc on Unix and MSVC on Windows) requires duplication of effort.
compilers might drag in runtime libraries (or even *nix emulations) on Windows that are painful. Two different C runtimes (e.g. linux libc and msvcrt) to base on complicate your own runtime and its maintenance
You get a big externally versioned blob in your project, which means a major version transition (e.g. a change of mangling could hurts your runtime lib, ABI changes like change of alignment) might require quite some work. Note that this goes for compiler AND externally versioned (parts of the) runtime library. And multiple compilers multiply this. This is not so bad for C as backend though as in the case where you directly connect to (read: bet on) a backend, like being a gcc/llvm frontend.
In many languages that follow this path, you see Cisms trickle through into the main language. Of course this won't happy to you, but you will be tempted :-)
Language functionality that doesn't directly map to standard C (like nested procedures,
and other things that need stack fiddling) are difficult.
If something is wrong, users will be confronted with C level compiler or linker errors that are outside their field of experience. Parsing them and making them your own is painful, specially with multiple compilers and -versions
Note that point 4 also means that you will have to invest time to just keep things working when the external projects evolve. That is time that generally doesn't really go into your project, and since the project is more dynamic, multiplatform releases will need a lot of extra release engineering to cater for change.
So in short, from what I've seen, while such a move allows a swift start (getting a reasonable codegenerator for free for many architectures), there are downsides. Most of them are related to loss of control and poor Windows support of *nix centric projects like gcc. (LLVM is too new to say much on long term, but their rhetoric sounds a lot like gcc did ten years ago). If a project you are hugely dependent on keeps a certain course (like GCC going to win64 awfully slow), then you are stuck with it.
First, decide if you want to have serious non *nix ( OS X being more unixy) support, or only a Linux compiler with a mingw stopgap for Windows? A lot of compilers need first rate Windows support.
Second, how finished must the product become? What's the primary audience ? Is it a tool for the open source developer that can handle a DIY toolchain, or do you want to target a beginner market (like many 3rd party products, e.g. RealBasic)?
Or do you really want to provide a well rounded product for professionals with deep integration and complete toolchains?
All three are valid directions for a compiler project. Ask yourself what your primary direction is, and don't assume that more options will be available in time. E.g. evaluate where projects are now that chose to be a GCC frontend in the early nineties.
Essentially the unix way is to go wide (maximize platforms)
The complete suites (like VS and Delphi, the latter which recently also started to support OS X and has supported linux in the past) go deep and try maximize productivity. (support specially the windows platform nearly completely with deep levels of integration)
The 3rd party projects are less clear cut. They go more after self-employed programmers, and niche shops. They have less developer resources, but manage and focus them better.
As you mentioned, whether C is a good target language depends very much on your source language. So here's a few reasons where C has disadvantages compared to LLVM or a custom target language:
Garbage Collection: A language that wants to support efficient garbage collection needs to know extra information that interferes with C. If an allocation fails, the GC needs to find which values on the stack and in registers are pointers and which aren't. Since the register allocator is not under our control we need to use more expensive techniques such as writing all pointers to a separate stack. This is just one of many issues when trying to support modern GC on top of C. (Note that LLVM also still has some issues in that area, but I hear it's being worked on.)
Feature mapping & Language-specific optimisations: Some languages rely on certain optimisations, e.g., Scheme relies on tail-call optimisation. Modern C compilers can do this but are not guaranteed to do this which could cause problems if a program relies on this for correctness. Another feature that could be difficult to support on top of C is co-routines.
Most dynamically typed languages also cannot be optimised well by C-compilers. For example, Cython compiles Python to C, but the generated C uses calls to many generic functions which are unlikely to be optimised well even by latest GCC versions. Just-in-time compilation ala PyPy/LuaJIT/TraceMonkey/V8 are much more suited to give good performance for dynamic languages (at the cost of much higher implementation effort).
Development Experience: Having an interpreter or JIT can also give you a much more convenient experience for developers -- generating C code, then compiling it and linking it, will certainly be slower and less convenient.
That said, I still think it's a reasonable choice to use C as a compilation target for prototyping new languages. Given that LLVM was explicitly designed as a compiler backend, I would only consider C if there are good reasons not to use LLVM. If the source-level language is very high-level, though, you most likely need an earlier higher-level optimisation pass as LLVM is indeed very low-level (e.g., GHC performs most of its interesting optimisations before generating calling into LLVM). Oh, and if you're prototyping a language, using an interpreter is probably easiest -- just try to avoid features that rely too much on being implemented by an interpreter.
Personally I would compile to C. That way you have a universal intermediary language and don't need to be concerned about whether your compiler supports every platform out there. Using LLVM might get some performance gains (although I would argue the same could probably be achieved by tweaking your C code generation to be more optimizer-friendly), but it will lock you in to only supporting targets LLVM supports, and having to wait for LLVM to add a target when you want to support something new, old, different, or obscure.
As far as I know, C can't query or manipulate processor flags.
This answer is a rebuttal to some of the points made against C as a target language.
Tail call optimizations
Any function that can be tail call optimized is actually equivalent to an iteration (it's an iterative process, in SICP terminology). Additionally, many recursive functions can and should be made tail recursive, for performance reasons, by using accumulators etc.
Thus, in order for your language to guarantee tail call optimization, you would have to detect it and simply not map those functions to regular C functions - but instead create iterations from them.
Garbage collection
It can be actually implemented in C. You can create a run-time system for your language which consists of some basic abstractions over the C memory model - using for example your own memory allocators, constructors, special pointers for objects in the source language, etc.
For example instead of employing regular C pointers for the objects in the source language, a special structure could be created, over which a garbage collection algorithm could be implemented. The objects in your language (more accurately, references) - could behave just like in Java, but in C they could be represented along with meta-information (which you wouldn't have in case you were working just with pointers).
Of course, such a system could have problems integrating with existing C tooling - depends on your implementation and trade-offs that you're willing to make.
Lacking operations
hippietrail noted that C lacks rotate operators (by which I assume he meant circular shift) that are supported by processors. If such operations are available in the instruction set, then they can be added using inline assembly.
The frontend would in this case have to detect the architecture which it's running for and provide the proper snippets. Some kind of a fallback in the form of a regular function should also be provided.
This answer seems to be addressing some core issues seriously. I'd like to see some more substantiation on which problems exactly are caused by C's semantics.
There's a particular case where if you're writing a programming language with strong security* or reliability requirements.
For one, it would take you years to know a big enough subset of C well enough that you know all the C operations you will choose to employ in your compilation are safe and don't evoke undefined behaviour. Secondly, you'd then have to find an implementation of C that you can trust (which would mean a tiny trusted code base, and probably wont be very efficient). Not to mention you'll need to find a trusted linker, OS capable of executing compiled C code, and some basic libraries, all of which would need to be well-defined and trusted.
So in this case you might as well either use assembly language, if you care about about machine independence, some intermediate representation.
*please note that "strong security" here is not related at all to what banks and IT businesses claim to have
Is it a good idea to compile a language to C?
No.
...which begs one obvious question: why do some still think compiling via C is a good idea?
Two big arguments in favour of misusing C in this fashion is that it's stable and standardised:
For GCC, it's C or bust (but there is work underway which may allow other options).
For LLVM, there's the routine breaking of backwards-compatibility in its IR and APIs - what would you prefer: spending time on improving your project or chasing after LLVM?
Providing little more than a promise of stability is somewhat ironic, considering the intended purpose of LLVM.
For these and other reasons, there are various half-done, toy-like, lab-experiment, single-site/use, and otherwise-ignominious via-C backends scattered throughout cyberspace - being abandoned, most have succumbed to bit-rot. But there are some projects which do manage to progress to the mainstream, and their success is then used by via-C supporters to further perpetuate this fantasy.
But if you're one of those supporters, feel free to make fantasy into reality - there's that work happening in GCC, or the resurrected LLVM backend for C. Just imagine it: two well-built, well-maintained via-C backends into which the sum of all prior knowledge can be directed.
They just need you.

What are the pitfalls and gotchas of mixing Objective-C and C?

At the risk of oversimplifying something I'm worried might be ridiculously complex, what should I be aware of when mixing C and Objective-C?
Edit: Just to clarify, I've never worked with C before, and I'm learning Objective-C through Cocoa. Also I'm using the Chipmunk Dynamics engine, which is C.
I'd put it the other way around: you might be risking overcomplicating something that is ridiculously simple :-)
Ok, I'm being a bit glib. As others are pointing out, Objective-C is really just a minimal set of language extensions to C. When you are writing Objective-C code, you are actually writing C. You can even access the internal machinations of the Objective-C runtime support using some handy C functions that are part of the language (no... I don't recommend you actually DO this unless you really know what you're doing).
About the only time I've ever had mildly tricky moments is when I wanted to pass an Objective-C instance method as a callback to a C function. Say, for example, I'm using a pure-C cross platform library that has functions which accept a callback. I might call the function from within an object instance to process some data, and then want that C function to call my instance BACK when its done, or as part of getting additional input etc etc (a common paradigm in C). This can be done with funky function wrapping, and some other creative methods I've seen, and if you ever need to do it googling "objective-c method for c callback" or something like that will give you the goods.
The only other word of advice is to make sure your objects appropriately manage any manually malloced memory that they create for use by C functions. You'll want your objective-c classes to tidy up that memory on dealloc if, indeed, it is finished.
Other than that, dust off any reference on C and have fun!
You can't 'mix' C and Objective-C: Objective-C is a superset of C.
Now, C++ and Objective-C on the other hand...
Objective C is a superset of C, so it shouldn't conflict.
Except that, as pointed here pure C has different conventions (obviously, since there is no built-in mechanism) to handle OO programming. In C, an object is simply a (struct *) with function pointers.

Resources