Win32 COM Programming in Pure C - c

I have a programming project which requires access to some of the lower-level Windows APIs (WASAPI, in particular). I am fairly experienced with higher-level programming languages such as C#, Java, and PHP, and I know only a little bit about C/C++.
I would like to use C (not C++) for this project, as C++ is kind of scary. Upon changing the C/C++ settings of my project in Visual Studio to compile as C code, I noticed that any calls to __uuidof won't work, as they are C++ specific.
My question is twofold:
Is it possible to write Win32 programs that utilize COM in pure C, and
If so, should using pure C be avoided?

Yes it is possible to use write pure C programs that utilize COM, in fact that was common practice 10-15 years ago.
You are not doing yourself a favor by using C (as you already have noticed). E.g. ATL provides lots of help when you are doing COM and can help you to avoid common mistakes.
If I were you I would go the C++ even if the threshold may be a bit higher at first. Also get a book in the subject if you do not have one. There are lots of examples to be found in the net but it is good to have something that leads you by the hand since COM programming is not for the faint of heart, regardless whether you use C or C++.

COM uses a pretty small subset of C++ beyond the C part, and there's no requirement to use many of the 'scary' items such as:
Exceptions - these can really bite you in C++ if you don't get them right (you need to really buy into the RAII idiom, which can take getting used to if you're used to more traditional C/Win32 coding); but exceptions are not part of COM at all: COM uses return codes for all error handling. There are some wrappers and compiler extensions that will convert COM's return codes into C++ exceptions if you want, but it's opt-in.
Class hierarchies and inheritance - Inheritance is also not really part of COM itself, other than the fact that all COM interfaces derive from (or, start off with the methods from) IUnknown. But you don't need to know anything about multiple virtual inheritance to use COM. Some of this is very useful to know if you are implementing COM objects yourself, but not to just use it. (For example, using C++ multiple inheritance is a very common way to implement a COM object that exposes multiple interfaces; but it's not the only way.)
Templates - again, not part of COM, but there are several libraries - eg. MFC and ATL - that use templates to make COM a lot simpler to use. This is especially true for smart pointer classes like CComPtr that will take care of some reference counting for you, allowing your code to concentrate on doing the real interesting stuff instead of having it packed with housekeeping stuff.
The main link between COM and C++ is all C++ compilers on Windows will lay out a C++ object in memory in a way that matches exactly what COM requires. This allows you to use a COM object as though it were a C++ object, which makes the code considerably less verbose, as the language/compiler is taking care of some really simple but verbose stuff for you.
So instead of doing the following in C (step through vtable and pass this param explicitly):
pUnk->lpVtbl->SomeMethod(pUnk, 42);
You can do the following in C++:
pUnk->SomeMethod(42);
You really don't want to have to type ->lpVtbl and make sure you pass the correct 'this' parameter (be careful when cutting and pasting!) with every single COM call you make, do you?
My suggestion would be to find a good COM book - Inside COM is a good one - and just start off using the C++ subset that you are comfortable with. Once you know how to use a COM pointer "raw", and use QI, AddRef and so on yourself, then perhaps you can play with the helper libraries that use templates to do some of the bookkeeping for you. Deciding to use wrappers that map COM errors to C++ exceptions is a bit of a bigger jump, as you need to write C++ code that is exception-safe first so need to understand the various issues with those first. But I can't think of any good reason - other than sheer curiosity - for going back and using COM from plain C.

Related

How much lisp to implement in C before writing extension in itself?

I am implementing a lisp interpreter in C, i have implemented along with few primitives like cons , car, cdr , eq, basic arithmetic stuff.
Just before i was starting to implement define and lambda it occurred to me that i need to implement an environment. I am unsure if i could implement it in lisp itself.
My intent is to implement minimal amount of lisp so that i could write extension to the language in itself. I am not sure how much is minimal, Would implementing FFI Qualify as minimal ?
The answer to your question depends on the meaning that you give to the word “minimal”.
Given your question, and assuming that you don't want to make an implementation competing with the nowdays fine implementations of Common Lisp and Schema, my hypothesis is that with “minimal” you intend: Turing complete, that is capable of expressing any computation expressible in a general purpose programming language.
With this assumption, you need to implement three other things:
conditional forms (cond)
lambda expressions (lambda)
a way of defining recursive lambda expression (labels or defun)
Your interpreter then should be able to evaluate forms. This should be sufficient to have a language equivalent to the initial LISP, that allow to express in the language any computable function.
First off, you are talking about first writing a LISP interpreter. You have a lot of choices to take when it comes to scoping, LISP1 vs LISP2 since these questions alter the implementation core. An interpreter is a general purpose program that reads and evaluates code. It can support abstractions but it won't extend itself by making more native stuff.
If you are interested in such stuff you can perhaps make a compiler instead. Eg. there are many Sceme like subsets that compiles to C or Java code, but you can make your own VM. Thus it can indeed compile itself to be run on it's own target machine (self hosting) if all the forms and procedures you use has been implemented using the primitives supported by the compiler.
Making a dumb compiler is not much difference from making an interpreter. That is very clear if yo've watched the SICP videos (10A is about compilation, 7A-B is about interpreters)
The environment can be a chain of pairs just as in a LISP interpreter. It would be difficult to implement the environment of itself in LISP without making it a very difficult Lisp language to use (unless it's compiled that is)
You may use the data structures of lisp and the primitives from the C code though.
Making a FFI is a fast way to give your language lots of features. It solves the chicken and egg problem by using other peoples work from within your language. In fuses the top (primitives and syntax) and the bottom layer (a runtime) of your system. It's the ultimate primitive and you can think of it as system call or message bus to the runtime.
I strongly suggest to read Queinnec's book: Lisp In Small Pieces. It is a book dedicated entirely to answer your question, and it explains in detail the many trade-offs and the internals of Lisp implementations and definitions, by giving many explained examples of Lisp interpreters and compilers.
You might also consider using libffi. You could be interested in the internals of M.Serrano's Bigloo & Hop implementations. You might even look inside my MELT lisp-like language to customize the GCC
compiler.
You also need to learn more about garbage collection (you might read the GC handbook). You could use Boehm's conservative Garbage Collector (or something else, e.g. my Qish or MPS) or write your own GC.
You may want to learn more about Chicken, Scheme 48, Guile and read their papers and look inside their code.
See also J.Pitrat's blog: it is not about Lisp (but about bootstrapping strong AI) and has several fascinating entries related to bootstrapping.

What 'mark' does a language leaves on a library that we need language bindings?

What mark does a language leaves on a compiled library that we need language bindings if we have call its functions from a different language?
object code looks 'language free' to me.
While learning OpenGL in c in Linux environment I have across language bindings.
Binding provides a simple and consistent way for applications to present and interact with data.
Source: The tag under your question
I'm guessing that you're either young or haven't been programming for more than a decade or so.
Object code should look language free, but it ain't due to history. Back in the 1970s and 1980s, on Intel 80x86 and Motorola 680x0 CPUs, function call arguments were passed on the stack. In the 'Pascal' convention, the number of arguments was fixed and the called function code removed them from the stack before returning. In the 'C' convention, the number of arguments was variable (eg printf) so the calling code had to remove them when the function returned. This cost 2 extra bytes per function call, which is nothing today but was significant back then when PCs only came with 128K or so of RAM. So Microsoft chose to use the Pascal calling convention for the Windows API, even though it was written in C. If your object code called a Windows function with the C convention by mistake, kaboom. This is why the header files are still cluttered up with WINAPI and _stdcall and _fastcall and whatnot.
Starting in the 1990s operating system authors realized this was silly and started imposing standard calling conventions on everyone. The C convention could handle both cases, so it got used everywhere. With the moves to MacOS X, 64 bit Windows, and ARM; we are finally getting language free object code.
Now, OpenGL was designed to be used from C and Fortran. (Which was in the 1990s still an important language for scientific calculations and visualization.) Both languages have integers, floating point numbers, and arrays of various sized ints/floats. C has structs but Fortran doesn't, and I suspect this is a major reason why the OpenGL API never uses any structs. There are also differences in the memory layout of 2D or higher dimension arrays between C and Fortran, and again note that the OpenGL API never specifies 2D arrays, only 1D.
A C API works for most languages. This is partly because C is 'portable assembler' that works onto almost any CPU and operating system. It's also because most other programming languages in common use are either supersets of C (C++, Objective-C) or implemented in C themselves (Python, Perl, Ruby) so can be made to call the OpenGL C API reasonably easily.
Java and C# have more problems, because they define their own object code, so to speak, and memory access is more tightly controlled. The C/OpenGL notion of 'here is a pointer to a block of memory, do what you like with it' breaks the security model of the JVM/CLR. So you end up having to use Java NIOByteBuffer things instead of just passing arrays.
A lot of it also comes down to the skill of the language binding designer. For one example, Python-OpenGL by Mike Fletcher is a really good binding. All the functions and constants have exactly the same names, so a lot of code can be just copied from C and pasted into Python. Python doesn't have C style arrays directly, but the language binding will silently translate any Python sequences/tuples you pass as "arrays" into the underlying C format for you. It feels natural for a Python programmer and still exposes the full capabilities of OpenGL.
For a bad example, JOGL is a pain in the arse. There's no automatic conversion from Java arrays to C, so you have to futz around with NIOByteBuffers yourself. It's so annoying that it's actually easier to use glBegin..glEnd blocks. And extra array offset arguments got added to a lot of OpenGL functions, so your code no longer looks the same as C/C++ and you waste a lot of time sticking ,0 on the end of function calls. Some of this is due to the JVM as mentioned before, but a lot of it is just bad design by (I suspect) somebody who never actually wrote much OpenGL themselves.
A long and rambling answer to a vague question.
Well, all you have to do is think about the myriad of calling conventions in C and C++. In order to prevent serious mishaps, the compiler mangles the function names based on calling convention so that you do not accidentally call a stdcall function using fastcall conventions. Each language has its own set of superfluous details like this that a language independent API should never have to burden itself with. Language bindings serve as an adapter/bridge that separates the language-specific stuff from the standardized API, filling in the gaps wherever necessary.
The OpenGL API is generally implemented in a single language (C) and programs written in other languages interface with the system's implementation through language bindings. OpenGL uses null-terminated ASCII strings for GLSL and has numerous functions that use pointers, things that make perfect sense for an API that is designed to be implemented in C. In Java, strings are not null-terminated and they are UTF-16 encoded; you can see why a bridge is needed. The Java GL bindings take care of string conversion and alter glVertexPointer (...)-like functions to fit Java's conditions for "pointing to" contiguous blocks of memory.

Coming from an OOP background, what would be some C programs/libraries to help me get the "C way"?

I have been doing OOP (C++/Java/PHP/Ruby) for a long time and really have a hard time imagining how large programs and libraries such as Linux or Apache can be written entirely in an imperative style. What would be small open source C projects I could look at to get a feel of how things are done in C?
Bonus points if the project is hosted on GitHub.
Things are done exactly the same way in C, but with less overt support from the language. Instead of creating a class to encapsulate some state, you create a struct. Instead of creating class members, with implicit this parameters, you create functions that you explicitly pass a struct* as the first parameter, that then operate on the struct.
To ensure that encapsulation is not broken you can declare the struct in a header, but only define it in the .c file where it is used. Virtual functions require more work - but again, its just a case of putting function pointers in the struct. Which is actually more convenient in C than C++ because in C you get to fill in your vtables manually, getting quite a fine level of control over which part of code implements part of what COM interface (if you are into COM in C of course).
You might find the ccan (Comprehensive C Archive Network, modeled after Perl's CPAN) interesting.
It's small at the moment, but the contributions are of high quality. Many of the contributions are by linux kernel developers.
Almost everything in there falls into the "few thousand LOC" or less category, too.
If you want a small example to start with, try looking at the source for the basic Linux CLI utilities. GNU binutils, make, or any of the other GNU utilities have full source code available and are relatively small code bases (some are larger than others). The easiest thing is usually to start with a utility that you have used before and are already familiar with.
Look at GLib for an almost canonical example of how to do object oriented programming in C.

Has the use of C to implement other languages constrained their designs in any way?

It seems that most new programming languages that have appeared in the last 20 years have been written in C. This makes complete sense as C can be seen as a sort of portable assembly language. But what I'm curious about is whether this has constrained the design of the languages in any way. What prompted my question was thinking about how the C stack is used directly in Python for calling functions. Obviously the programming language designer can do whatever they want in whatever language they want, but it seems to me that the language you choose to write your new language in puts you in a certain mindset and gives you certain shortcuts that are difficult to ignore. Are there other characteristics of these languages that come from being written in that language (good or bad)?
I tend to disagree.
I don't think it's so much that a language's compiler or interpreter is implemented in C — after all, you can implement a virtual machine with C that is completely unlike its host environment, meaning that you can get away from a C / near-assembly language mindset.
However, it's more difficult to claim that the C language itself didn't have any influence on the design of later languages. Take for example the usage of curly braces { } to group statements into blocks, the notion that whitespace and indentation is mostly unimportant, native type's names (int, char, etc.) and other keywords, or the way how variables are defined (ie. type declaration first, followed by the variable's name, optional initialization). Many of today's popular and wide-spread languages (C++, Java, C#, and I'm sure there are even more) share these concepts with C. (These probably weren't completely new with C, but AFAIK C came up with that particular mix of language syntax.)
Even with a C implementation, you're surprisingly free in terms of implementation. For example, chicken scheme uses C as an intermediate, but still manages to use the stack as a nursery generation in its garbage collector.
That said, there are some cases where there are constraints. Case in point: The GHC haskell compiler has a perl script called the Evil Mangler to alter the GCC-outputted assembly code to implement some important optimizations. They've been moving to internally-generated assembly and LLVM partially for that reason. That said, this hasn't constrained the language design - only the compiler's choice of available optimizations.
No, in short. The reality is, look around at the languages that are written in C. Lua, for example, is about as far from C as you can get without becoming Perl. It has first-class functions, fully automated memory management, etc.
It's unusual for new languages to be affected by their implementation language, unless said language contains serious limitations. While I definitely disapprove of C, it's not a limited language, just very error-prone and slow to program in compared to more modern languages. Oh, except in the CRT. For example, Lua doesn't contain directory functionality, because it's not part of the CRT so they can't portably implement it in standard C. That is one way in which C is limited. But in terms of language features, it's not limited.
If you wanted to construct an argument saying that languages implemented in C have XYZ limitations or characteristics, you would have to show that doing things another way is impossible in C.
The C stack is just the system stack, and this concept predates C by quite a bit. If you study theory of computing you will see that using a stack is very powerful.
Using C to implement languages has probably had very little effect on those languages, though the familiarity with C (and other C like languages) of people who design and implement languages has probably influenced their design a great deal. It is very difficult to not be influenced by things you've seen before even when you aren't actively copying the best bits of another language.
Many languages do use C as the glue between them and other things, though. Part of this is that many OSes provide a C API, so to access that it's easy to use C. Additionally, C is just so common and simple that many other languages have some sort of way to interface with it. If you want to glue two modules together which are written in different languages then using C as the middle man is probably the easiest solution.
Where implementing a language in C has probably influenced other languages the most is probably things like how escapes are done in strings, which probably isn't that limiting.
The only thing that has constrained language design is the imagination and technical skill of the language designers. As you said, C can be thought of as a "portable assembly language". If that is true, then asking if C has constrained a design is akin to asking if assembly has constrained language design. Since all code written in any language is eventually executed as assembly, every language would suffer the same constraints. Therefore, the C language itself imposes no constraints that would be overcome by using a different language.
That being said, there are some things that are easier to do in one language vs another. Many language designers take this into account. If the language is being designed to be, say, powerful at string processing but performance is not a concern, then using a language with better built-in string processing facilities (such as C++) might be more optimal.
Many developers choose C for several reasons. First, C is a very common language. Open source projects in particular like that it is relatively easier to find an experienced C-language developer than it is to find an equivalently-skilled developer in some other languages. Second, C typically lends itself to micro-optimization. When writing a parser for a scripted language, the efficiency of the parser has a big impact on the overall performance of scripts written in that language. For compiled languages, a more efficient compiler can reduce compile times. Many C compilers are very good at generating extremely optimized code (which is also part of the reason why many embedded systems are programmed in C), and performance-critical code can be written in inline assembly. Also, C is standardized and is generally a static target. Code can be written to the ANSI/C89 standard and not have to worry about it being incompatible with a future version of C. The revisions made in the C99 standard add functionality but don't break existing code. Finally, C is extremely portable. If at least one compiler exists for a given platform, it's most likely a C compiler. Using a highly-portable language like C makes it easier to maximize the number of platforms that can use the new language.
The one limitation that comes to mind is extensibility and compiler hosting. Consider the case of C#. The compiler is written in C/C++ and is entirely native code. This makes it very difficult to use in process with a C# application.
This has broad implications for the tooling chain of C#. Any code which wants to take advantage of the real C# parser or binding engine has to have at least one component which is written in native code. This eventually results in most of the tooling chain for the C# language being written in C++ which is a bit backwards for a language.
This doesn't limit the language per say but definitely has an effect on the experience around the language.
Garbage collection. Language implementations on top of Java or .NET use the VM's GC. Those on top of C tend to use reference counting.
One thing I can think of is that functions are not necessarily first class members in the language, and this is can't be blamed on C alone (I am not talking about passing a function pointer, though it can be argued that C provides you with that feature).
If one were to write a DSL in groovy (/scheme/lisp/haskell/lua/javascript/and some more that I am not sure of), functions can become first class members. Making functions first class members and allowing for anonymous functions allows to write concise and more human readable code (like demonstrated by LINQ).
Yes, eventually all of these are running under C (or assembly if you want to get to that level), but in terms of providing the user of the language the ability to express themselves better, these abstractions do a wonderful job.
Implementing a compiler/interpreter in C doesn't have any major limitations. On the other hand, implementing a language X to C compiler does. For example, according to the Wikipedia article on C--, when compiling a higher level language to C you can't do precise garbage collection, efficient exception handling, or tail recursion optimization. This is the kind of problem that C-- was intended to solve.

What are the pitfalls and gotchas of mixing Objective-C and C?

At the risk of oversimplifying something I'm worried might be ridiculously complex, what should I be aware of when mixing C and Objective-C?
Edit: Just to clarify, I've never worked with C before, and I'm learning Objective-C through Cocoa. Also I'm using the Chipmunk Dynamics engine, which is C.
I'd put it the other way around: you might be risking overcomplicating something that is ridiculously simple :-)
Ok, I'm being a bit glib. As others are pointing out, Objective-C is really just a minimal set of language extensions to C. When you are writing Objective-C code, you are actually writing C. You can even access the internal machinations of the Objective-C runtime support using some handy C functions that are part of the language (no... I don't recommend you actually DO this unless you really know what you're doing).
About the only time I've ever had mildly tricky moments is when I wanted to pass an Objective-C instance method as a callback to a C function. Say, for example, I'm using a pure-C cross platform library that has functions which accept a callback. I might call the function from within an object instance to process some data, and then want that C function to call my instance BACK when its done, or as part of getting additional input etc etc (a common paradigm in C). This can be done with funky function wrapping, and some other creative methods I've seen, and if you ever need to do it googling "objective-c method for c callback" or something like that will give you the goods.
The only other word of advice is to make sure your objects appropriately manage any manually malloced memory that they create for use by C functions. You'll want your objective-c classes to tidy up that memory on dealloc if, indeed, it is finished.
Other than that, dust off any reference on C and have fun!
You can't 'mix' C and Objective-C: Objective-C is a superset of C.
Now, C++ and Objective-C on the other hand...
Objective C is a superset of C, so it shouldn't conflict.
Except that, as pointed here pure C has different conventions (obviously, since there is no built-in mechanism) to handle OO programming. In C, an object is simply a (struct *) with function pointers.

Resources