what type of parser is bison. is it a LALR(1) or LR(1) ?
Short answer: both.
By default, it produces LALR(1) parsers.
With the explicit option %glr-parser, it'll produce an LR(1) parser.
Yep, since version 2.5, Bison does support several types of LR parsers: LALR(1), canonical LR(1), and IELR(1). See the documentation about "lr.type", for instance here.
Related
How to remove all C and C++ comments in vi?
//
/*
*/
You can't. Parsing C and C++ comments is not something that regular expressions can do. It might work for simple cases, but you never know if the result leaves you with a corrupted source file. E.g. what happens to this:
printf ("//\n");
The proper way is to use an external tool able to parse C. For example some compilers may have an option to strip comments.
Writing a comment stripper is also a basic exercise in lex and yacc programming.
See also this question: Remove comments from C/C++ code
With regular expressions, because of the complexity of C/C++ syntax, you will at best achieve a solution that is 90% correct. Better let the right tool (a compiler) do the job.
Fortunately, Vim integrates nicely with external tools. Based on this answer, you can do:
:%! gcc -fpreprocessed -dD -E "%" 2>/dev/null
Caveats:
requires gcc
it also slightly modifies the formatting (shrunk indent etc.)
Esc:%s/\/\///
Esc:%s/\/\*//
Esc:%s/\*\///
You can use a lexical analyzer like Flex directly applied to source codes. In its manual you can find "How can I match C-style comments?".
If you need an in-depth tutorial, you can find it here; under "Lexical Analysis" section you can find a pdf that introduce you to the tool and an archive with some practical examples, including "c99-comment-eater".
I have been trying to research on source code parser and often i would find people talking about parsing grammar.
so I was wondering what's the difference between a source code parser and grammar parser, are they the same thing?
The phrase "source code parser" by itself is clear enough: this is a mechanism that parses source text, using either a parser-generator engine based off of a formal grammar or some kind of hand-coded (typically recursive descent) parser derived from the grammar informally. It is unclear what the result of a "source code parser" is from just the phrase; it might just be "yes, that's valid syntax", more usually "produces a parse or abstract syntax tree", or it might be (sloppily) "full abstract syntax tree plus symbol table plus control and dataflow analyses".
The phrase "grammar parser" is not one I encounter much (and I work a lot in this field). It is likely something garbled from some other source. In the absence of a widely known definition, one would guess that this means a) a "source code parser" driven by a parser generator engine from a formal grammar, or b) a "source code parser" which parses a grammar (which is a kind of source code, too), as an analog to the phrase "Fortran parser". For the latter I would tend to write "parser for a grammar" to avoid confusion, although "Fortran parser" is pretty clear.
You used a third term, "parsing grammar" which I also don't encounter much. This is likely to mean b) in the above paragraph.
Where did your terms come from?
Bison is a general purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar.
This kind of talk is not correct. There are 3 mistakes in this. It should read:
Bison is a general purpose parser generator that
reads a BNF grammar, which defines the syntax of a context-free language,
does an LALR(1) analysis and conflict resolution, and
outputs a C program that reads input written in the language whose syntax
is defined in the BNF grammar.
My intent is not to criticize, but to get people to use the correct terminology.
There is already enough misunderstanding in this subject.
Does C supports an equivalent of the triple-slash, XML Documentation Comments that Visual Studio uses to provide helpful tooltips for my code in C#, or am U just spoiled by being able to use them in C#?
If it's not supported in C, are there other options?
C does not have any equivalent of XML documentation comments or JavaDoc.
Try doxygen.
In the C language itself, a triple-slash comment is nothing special (they're just double-slash comments that happen to start with a slash).
However, you can use triple-slash comments with Doxygen.
Most modern C compilers will understand double-slash comments like in C++. They are part of the C99 spec.
Presumably you're talking about creating comment blocks that are formatted for automated extraction.
Doxygen supports special comment blocks that start with a double-slash C++ comment delimiter, followed by either another slash, or an exclamation mark.
C supports /* */ comments. C99 adds support for // comments. Your IDE or compiler may support more, but that is non-standard.
I'm looking for a good open source C/C++ regular expression library that has full Unicode support.
I'm using this in an environment where the library might get ASCII, UTF-8, or UTF-16. If it gets UTF-16 it might or might not have the necessary quoting characters (FF FE) or (FE FF).
I've looked around and there don't seem to be any options other than PCRE.
My second problem is that I'm currently using flex to build some HUGE regular expressions. Ideally I would have a flex-like lexical expression generator that also handles Unicode.
Any suggestions?
Have you considered ICU?
It has mature regular expression support.
I believe Boost Spirit and Boost Regex both have at least some degree of Unicode support.
I'm looking for a parser for C. Here is what I need:
Written in C (not C++).
Handwritten (not generated).
BSD or similarly permissive license.
Capable of nontrivially parsing itself (can be a subset of C).
It can be part of a project as long as it's decoupled so that I can pull out the parser.
Is there an existing parser that fulfills these requirements?
If you don't need C99, then lcc is a slam dunk:
It is documented in a very clear, well-written book.
Techniques used for recursive-descent parsing of operators with precedence are well documented in an article and technical report by Dave Hanson.
Clear, handwritten ANSI C code.
One potential downside is that the lcc parser does not build an abstract-syntax treeāit goes straight from parsing to intermediate code.
If you must have C99 then I think tinycc (tcc) is your best bet.
How about Sparse?
You could try TCC. It's licensed under the Lesser GPL.
It seems that nwcc sufficiently agrees with your requirements.
Good c compiler is present at this location. Simple and accessible.
https://github.com/rui314/8cc
GCC has one in gcc/c-parser.c.
Check elsa, it uses the Generalized LR algorithm.
Its main use is for C++, but it also parses C code.
Check on its page, on the section called "How much C can Elsa parse?" which says it can parse most C programs, including the Linux kernel.
It's released under a BSD license.
Here is a recursive descent parser I ported to C:
http://www.gabotronics.com/resources/recursive-descent-parser.htm