Implementing a C preprocessor - c

Much has been written over the years on implementing parsers, but the C preprocessor is not quite the same as any of the stages of a typical parser, and implementation thereof doubtless has its share of particular pitfalls to watch out for. Does anyone know of anything written on the topic of implementing a C preprocessor?

Hartmut Kaiser, the author of Boost Wave, wrote a nice article on CodeProject http://www.codeproject.com/KB/recipes/wave_preprocessor.aspx about the Boost Wave project. You can use Boost Wave to make your own C preprocessor with custom extensions.

I found a useful discussion in the document mcpp-summary at http://mcpp.sourceforge.net/

I've based mine on the gnu internals

Related

GNU parser Optimization Using L-System

Any suggestion on how to use Lindenmeyer System approach (L-System) to make the GNU parser faster through parallelism. I also need to compare the normal execution time and execution time when the L-system is implemented for the C language. Any suggestion will be helpful.
Your question is too broad for this format. Note that you can write a very efficient parser for the C language without the L-System, the TinyCC compiler is a good example of a hand written C parser written int the C language that is very efficient. Check it out at http://tinycc.org

A more complete recursive descent c interpreter

I've seen several implementations of recursive descent c interpreters which all seem
to do a pretty good job - yet they all only implement a small portion of the C language -
for example they don't support structs or typedefs etc -
Does anyone know of any code that supports a large portion of the C language.
I know adding more functionality would be pretty trivial - but I'm a bit strapped
for time.
Picoc supports more that most of the Tiny/Small C interpreters. You might give it a look. And it does support structures.
If you just want to use it, this one looks awfully good for the job. There was a Dr. Dobb's article on it a while back ... there it is

How do I implement parsing?

I am designing a compiler in C. I want to know which technique I should use, top-down or bottom up? I have only implemented operator precedence using bottom up. I have applied the following
rules:
E:=E+E
E:=E-E
E:=E/E
E:=E*E
E:=E^E
I want to know that am I going the right away?
If I want to include if-else, loops, arrays, functions, do I need to implement parsing?
If yes, how do I implement it? Any one can
I have only implemented token collection and operator precedence. What is the next steps?
Lex & Yacc is your answer. Or Flex and Bison which are branched version of original tools.
They are free, they are the real standard for writing lexers and parsers in C and used all around everywhere.
In addition O'Reilly has released a small pearl of 300 pages: Flex & Bison. I bought it and it really explains you how to write a good parser for a programming language and handle all the subtle things (error recovery, conflicts, scopes and so on). It will answer also your questions about how you are parsing expressions: your approach is right with a top-down parser but you'll discover that is not enough to handle operator precedences.
Of course, for hobby, you could write your own lexer and parser but it would be just an academic effort that is nice to understand how FSM and parser work but with no so much fun :)
If you are, instead, interested in programming language design or complex implementations I suggest this book: Programming Language Pragmatics that is not so famous because of the Dragon Book but it really explains why and how various characteristics can and should be implemented in a compiler. The Dragon Book is a bible too, and it will cover at a real low level how to write a parser.. but it would be sort of boring, I warn you..
The best way to implement a good parser in C is using flex & yacc
Your question is quite vague and hard to answer without a more specific, detailed question. The "Dragon book" is an excellent reference though for someone seeking to implement a compiler from scratch, or as others have pointed out Lex and Yacc.
If you intend to implement the parser by hand, you will want to do a recursive descent parser. The code directly reflects the grammar, so it's fairly easy to figure out and understand. It places some restrictions on your grammar (you can't have any left-recursive nonterminals), but you can work around those problems.
However, it depends on the complexity of the grammar; hand-hacking a parser for anything much more complicated than basic arithmetic expressions gets very tedious very quickly. If you're trying to implement anything that looks like a real programming language, use a parser generator like yacc or bison.

HAT-trie in ANSI C implementation?

I am looking for ANSI C HAT-trie implementation released under some free license. I have not found one. Can you point me to some standalone implementation or a program that uses
HAT-tries to get at least slight idea how to implement it the roght way, please?
The original paper on HAT-trie can be found here:
http://crpit.com/confpapers/CRPITV62Askitis.pdf
PS: In case faster cache-conscious data structured well-suited for strings evolved since
the time the above paper was written, please point me to the papers or example source codes rather.
Someone is implementing it in C++ over on github
https://github.com/chris-vaszauskas/hat-trie
If you need a plain C implementation, this would be a good base to start from.
Java is also fairly readable for a C programmer
http://www.stochasticgeometry.ie/2008/05/06/implementing-hat-tries-in-java/
Please see the HAT-trie implementation site at code.google.com/p/hat-trie for implementation notes and source code.

Recursive Descent Parser for C

I'm looking for a parser for C. Here is what I need:
Written in C (not C++).
Handwritten (not generated).
BSD or similarly permissive license.
Capable of nontrivially parsing itself (can be a subset of C).
It can be part of a project as long as it's decoupled so that I can pull out the parser.
Is there an existing parser that fulfills these requirements?
If you don't need C99, then lcc is a slam dunk:
It is documented in a very clear, well-written book.
Techniques used for recursive-descent parsing of operators with precedence are well documented in an article and technical report by Dave Hanson.
Clear, handwritten ANSI C code.
One potential downside is that the lcc parser does not build an abstract-syntax treeā€”it goes straight from parsing to intermediate code.
If you must have C99 then I think tinycc (tcc) is your best bet.
How about Sparse?
You could try TCC. It's licensed under the Lesser GPL.
It seems that nwcc sufficiently agrees with your requirements.
Good c compiler is present at this location. Simple and accessible.
https://github.com/rui314/8cc
GCC has one in gcc/c-parser.c.
Check elsa, it uses the Generalized LR algorithm.
Its main use is for C++, but it also parses C code.
Check on its page, on the section called "How much C can Elsa parse?" which says it can parse most C programs, including the Linux kernel.
It's released under a BSD license.
Here is a recursive descent parser I ported to C:
http://www.gabotronics.com/resources/recursive-descent-parser.htm

Resources