C/C++/C#/VB based Lexical Analyzers - lexical-analysis

During the Compiler Design Lab hours, I'm using JLex as the Lexical Analyzer Generator, which produces a Java program from a lexical specification.
I'd like to know if there are other tools which can help me in the same by generating C/C++/C# or VB programs instead of a Java program, and can run in Windows.

C#
GPLEx is a generator for lexical scanners accepting a “LEX-like” input specification and produces a C# output file (C# 2 with generics).
Grammatica is a C# and Java parser generator
ANTLR parser generator that supports generating code in C, Java, Python, C#, and Objective-C
C# Lex
C# Flex
Java
JLex
JFLex
ANTLR
Grammatica
Ragel is a finite state machine compiler with output support for C, C++, Objective-C, D, Java and Ruby source code
C/C++
Lex
Flex
Flex++
ANTLR
Quex
Ragel

Lex (and its variants like Flex) should be a starting point. You can download the Windows ports of these to get started. The output is in C. Additionally, look for yacc and bison if you want parser generators too. Here is a comprehensive page for all four.

Ragel can generate fast lexical analyzers from a regular language in C, C++, Objective-C, D, Java, and Ruby.

Related

Is there an external parser generator tool used for building Alloy language parser

Have Alloy developers used any parser generator tool (like ANTLR) for parsing alloy specifications, or is its parser built-in and specifically written for the alloy language purpose?
If they used external tool for Alloy parser implementation, how can I access further information regarding this (for example the grammar which is fed into the external parser generator).
Alloy uses a modified version of CUP (which is shipped with the Alloy distribution). You can find the grammar specification files (Alloy.lex and Alloy.cup) inside the edu.mit.csail.sdg.alloy4compiler.parser package. In the same package there are some bash scripts used to generate corresponding lexer/parser classes.
http://alloy.mit.edu/alloy/documentation/book-chapters/alloy-language-reference.pdf
Section B.3 has the grammar.
Can't say anything about the language implementation.

Compile ANTLR generated code with C runtime on IBM mainframe (z/OS) using Metal C

I'm looking to run a parser generated by ANTLR 3.5.2 on an IBM mainframe under zOS. I don't need to run ANTLR itself on the mainframe. But I do need to be able to run the C code that ANTLR generates on the mainframe along with ANTLR's C runtime. My question is really about how compatible gnu C is with Metal C for ANTLR's C runtime.
Either a positive or negative experience porting the C runtime to Metal C would help me assess the difficulty of this task.
Thanks very much.

Which Eiffel compilers use Earley parsing

I stumbled upon this post http://compilers.iecc.com/comparch/article/02-04-096
that says there are two Eiffel compilers using Earley parsing. The post is quite old.
I wonder if anyone here knows which Eiffel compilers use Earley parsers and if they are
still in use? Links are highly appreciated.
The modern Eiffel compilers that are used in production (EiffelStudio from Eiffel Software and gec from Gobo Eiffel Project - both open source) parse Eiffel code using the parsers generated by geyacc from the parser description files (here are the links for EiffelStudio and Gobo), a parser generator utility similar to GNU bison, that converts a grammar description for an LALR(1) context-free grammar, but is adapted to produce Eiffel code that is type-safe and void-safe. Neither use an Earley parser.

Libraries that parse code written in C and provide an API

I am implementing a proof of concept application for source-to-source transformation and need a C-parser with an API for manipulating/traversing the C-syntax tree (AST).
I have tried to use clang but I ran into various problems, like not being able to compile the tutorials using libclang, wrong architecture etc. Since this is a proof of concept application, I will defer clang to a different date.
Question
What are some software/libraries (implemented in any language) which can parse C code and which provide an API so I can build applications on top of them. I looked around, but I could not locate any free parsers.
The platforms I can use are anything on Windows or Mac or Linux, and any parsers written in C/C++/Java/Perl/Python/PHP will work.
You could try one of the available grammars for ANTLR. ANTLR has support for creating tree walkers and you can walk/manipulate the AST manually if necessary. ANTLR V3 has several grammars available including a C preprocessor, ANSI C and GNU C.

What are resources for building a static analyzer for C in C?

I have a school project to develop a static analyzer in C for C.
Where should I start? What are some resources which could assist me?
I am assuming I will need to parse C, so what are some good parsers for C or tools for building C parsers?
I would first take yourself over to antlr, look at its getting started guide, it has a wealth of information about parsing etc.., I personally use antlr as it gives a choice of code generation targets.
To use antlr you need a c or c++ grammar file, pick of these up and start playing.
Anyway have fun with it..
Probably your best starting point would by Clang (with the proviso that it already has a static analyzer, so unless you want to write one for its own sake, you might be better off using/enhancing the existing one).
Are you sure that you want to write the analyzer in C?
If you were using a modern langauge (e.g. C#, Java, Python), then I would second spgennard's suggestion of ANTLR for the parser.
If writing the analyzer in C is a requirement then you are stuck with lex and yacc (flex and bison) or maybe a hand-crafted parser.
Looks like Uno comes close to what you want to do. It uses lex/yacc and includes the grammar files. The analysis part however is written in C++.
Maybe you can get some more ideas about the how and what from tools listed at SpinRoot. Wikipedia also has some good info.
Parsing is the easiest and least important part of a static analyser. Antlr was already suggested, it should be sufficient for parsing plain C (but not C++). Just a little tip - do not implement your own preprocessor, better reuse the output of gcc -E.
As for the rest, you can take a look at some of the existing analysers sources, namely Clang and CIL, read about an SSA representation and abstract interpretation. Choosing the right intermediate representation for your code is a key.
I doubt it can be an easy task in plain C, so you'd probably end up implementing some sort of DSL on top of it to handle ASTs and transforms. Sounds like something much bigger than a typical school project.

Resources