I was wondering if anyone had a working C grammar for ANTLRv4 besides the one on Github?
I can't get the existing one to work at all, it won't even parse the sample files. It may be i'm missing something but I haven't had a problem with any of the other grammars.
I was thinking about modifying the existing one/writing my own, but I don't really have the time - I have limited time to work on this project.
Any help much appreciated.
thanks,
Katy
So you cannot create a working C grammar in less than a few months and it is more complex than it seems like. My opinion is that parsing all C (without preprocessor) takes 6 months to do it well.
For example, the first impression is that C grammar is context-free, but in reality it is context-sensitive.
Take the official grammar from Appendix A of the ISO Standard and start implementing sublanguages from it, inserting nonterminals one by one.
Related
The C compiler was written in C, that much I know. What language/tools/instructions was/were used to create the initial functionality for the C compiler to become self-hosting?
UPDATE:
I know what bootstrapping a language is, and it is that research that prompted this question. I cannot find an answer to my question anywhere on SO.
There are a multitude of C compilers in the world. Many of them were (and still are) written in C. However the first one was not - it was written in B.
I've seen several implementations of recursive descent c interpreters which all seem
to do a pretty good job - yet they all only implement a small portion of the C language -
for example they don't support structs or typedefs etc -
Does anyone know of any code that supports a large portion of the C language.
I know adding more functionality would be pretty trivial - but I'm a bit strapped
for time.
Picoc supports more that most of the Tiny/Small C interpreters. You might give it a look. And it does support structures.
If you just want to use it, this one looks awfully good for the job. There was a Dr. Dobb's article on it a while back ... there it is
I am doing static analyze on c program.And I search the antlr website ,there seems to be no appropriate grammar file that produce ast for c program.Does it mean I have to do it myself from the very start.Or is there a quicker method.I also need a tree parser that can traverse the ast created by the parser.
You indicated you want to do static analysis to detect buffer overflow.
First, writing a grammar for C is harder than it looks. There's all that stuff in the standard, and then there's what the real compilers actually accept. And you have to decide what to do about the preprocessor (and it varies from compiler to compiler!). If you don't get the grammar and preprocessing exactly right, you won't be able to parse real programs. (If you want to do toy languages, that's fine, but then you don't need a C grammar).
To do the analysis, you'll need far more machinery than an AST. You'll need symbol tables, control and data flow analysis, likely local and global points-to analysis, call graph extraction, and some type of range analysis.
People just don't seem to understand this.
** GETTING A PARSER IS A LONG WAY FROM DOING ANYTHING USEFUL WITH REAL LANGUAGES **
I'm shouting because I see this over, and over, and over.
If you want to get on with a specific program analysis or transformation task, unless you want to die of old age before you start your task, you better find a foundation that has most of what you need already. A foundation on a parser generator with a creaky grammar is not a foundation. (Don't get me wrong: ANTLR, YACC, JavaCC are all fine parser generators, and they're great for building a parser for a new language. They're great for implementing production parsers for real langauges when the investment gets made. But they produce parsers, and mostly people don't do the production part. And they don't provide the additional machinery by a long shot.)
Our DMS Software Reengineering Toolkit contains all the above machinery because it is almost always needed, and it is a royal headache to implement. (My team has 15 years invested so far.)
We've also instantiated that machinery is forms specifically useful for COBOL and Java, C, C++ (to somewhat lesser extent, the language is really hard), in a variety of dialects, so that others don't have to repeat this long process.
GCC and Clang are pretty mature for C and C++ as alternatives.
The hardest part is writing the grammar. Mixing in rewrite rules to create an AST isn't that hard, and creating a tree grammar from a parser grammar that emits an AST isn't that hard too (compared to writing the parser grammar, that is).
Here's a previous Q&A that shows how to create a proper AST: How to output the AST built using ANTLR?
And I couldn't find a decent SO-Q&A that explains how to go about creating a tree grammar, so here's a link to my personal blog that explains this: http://bkiers.blogspot.com/2011/03/6-creating-tree-grammar.html
Good luck.
Is there any C grammar available which generates the AST, which includes all the parser rules using "^" and "!" notations?
I went through the book written by Terence Parr, to write such a grammar, but it seems that writing one such grammar for C lang is a time consuming process, so was wondering if its available already which can me save a lot of time!
(A grammar for a smaller subset of C language is also fine..)
Thanks :)
See this. It's straight from the ANTLR 4 source repo: a C11 grammar. It looks pretty compliant.
Of course, it doesn't come with a preprocessor, but handing cpp or mcpp the file first is easy enough.
It also doesn't come with AST rules, but it doesn't look too hard to do (albeit time consuming).
No answers after two weeks.
You are right, building a full parser that builds complete ASTs and handles all the details of C (including preprocessor)
covering a variety of dialects of C (e.g., ANSI, GNU C 2/3/4/, Miscrosoft Visual C, Green Hills C)... is actually a lot of work. And unless you invest this work, it won't process any real C programs.
I would expect there to be a full ANTLR grammar for C that did this considering how old ANTLR is. It is surprising that nobody here can seem to identify one; certainly you'd expect to find it at the ANTLR site.
We've put the energy required into building such C parsers (covering all the above dialects), and added computing symbol tables, extracting control and data flows, building call graphs, enabling analyzers, and tree transformations in the DMS Software Reengineering Toolkit with its C front end. This front end has been applied to C applications comprised of 18,000 compilation units to build custom analysis tools.
I am designing a compiler in C. I want to know which technique I should use, top-down or bottom up? I have only implemented operator precedence using bottom up. I have applied the following
rules:
E:=E+E
E:=E-E
E:=E/E
E:=E*E
E:=E^E
I want to know that am I going the right away?
If I want to include if-else, loops, arrays, functions, do I need to implement parsing?
If yes, how do I implement it? Any one can
I have only implemented token collection and operator precedence. What is the next steps?
Lex & Yacc is your answer. Or Flex and Bison which are branched version of original tools.
They are free, they are the real standard for writing lexers and parsers in C and used all around everywhere.
In addition O'Reilly has released a small pearl of 300 pages: Flex & Bison. I bought it and it really explains you how to write a good parser for a programming language and handle all the subtle things (error recovery, conflicts, scopes and so on). It will answer also your questions about how you are parsing expressions: your approach is right with a top-down parser but you'll discover that is not enough to handle operator precedences.
Of course, for hobby, you could write your own lexer and parser but it would be just an academic effort that is nice to understand how FSM and parser work but with no so much fun :)
If you are, instead, interested in programming language design or complex implementations I suggest this book: Programming Language Pragmatics that is not so famous because of the Dragon Book but it really explains why and how various characteristics can and should be implemented in a compiler. The Dragon Book is a bible too, and it will cover at a real low level how to write a parser.. but it would be sort of boring, I warn you..
The best way to implement a good parser in C is using flex & yacc
Your question is quite vague and hard to answer without a more specific, detailed question. The "Dragon book" is an excellent reference though for someone seeking to implement a compiler from scratch, or as others have pointed out Lex and Yacc.
If you intend to implement the parser by hand, you will want to do a recursive descent parser. The code directly reflects the grammar, so it's fairly easy to figure out and understand. It places some restrictions on your grammar (you can't have any left-recursive nonterminals), but you can work around those problems.
However, it depends on the complexity of the grammar; hand-hacking a parser for anything much more complicated than basic arithmetic expressions gets very tedious very quickly. If you're trying to implement anything that looks like a real programming language, use a parser generator like yacc or bison.