SGML Parser in Plain C - c

I'm looking for an open-source SGML parser written in plain C. This is to parse bona-fide SGML, not malformed stuff.
Any ideas?

There's OpenSP, which is part of the OpenJade project, but is implemented in C++. Might be close enough for your needs?

This came up on a fast Google search (sgml c parser): http://www.w3.org/Library/src/SGML.html. Does that help?
Or perhaps this one: http://www.math.utah.edu/pub/sgml/sgmls/

Related

Ocaml - Files and parsing

How to read contents from file in ocaml? Specifically how to parse them?
Example :
Suppose file contains (a,b,c);(b,c,d)| (a,b,c,d);(b,c,d,e)|
then after reading this, I want two lists containing l1 = [(a,b,c);(b,c,d)] and l2 = [(a,b,c,d);(b,c,d,e)]
Is there any good tutorial for parsing?
This is a good use case for the menhir parser generator (successor to ocamlyacc). You might want to use ocamllex for lexing. All have good documentation.
You could also use camlp4 or camlp5 stream parsing abilities.
Read also the wikipedia pages on lexing & parsing.
I'd be inclined to use Aurochs, a PEG parser for something like this. There is example code in the repo there.
If you want to specify a grammar and have ocaml generate lexers and parsers for you, check out these ocamllex and ocamlyacc tutorials. I recommend doing it this way. If you really only have one type of token in your file format, then ocamlyacc might be overkill if you can just use the lexer to split the file up into tokens that are considered valid by the grammar.

Json string parser using C

I was referring a site called "joys of programming" for JSON Parser in C. The site seems down and I am not able to get information regarding JSON parser. It would be great if some one can guide me. I want to know how to create a JSON Array.Thanks in advance.
If you want to make you own json parser, you have to look at the language grammar, which is probably LL. Writing such a LL parser is almost trivial and kind of funny, use a regex library to save a precious time.
If you're looking for a library to deal with Json data, here is the second result Google gave me.
I found several lib could do this work.
Jsoncpp, JsonValue, cppCMS, JsonSpirit and Jansson. The jsonvalue is the easiest one. It just contains a pair of .h file and .cpp file.

Implementing a C preprocessor

Much has been written over the years on implementing parsers, but the C preprocessor is not quite the same as any of the stages of a typical parser, and implementation thereof doubtless has its share of particular pitfalls to watch out for. Does anyone know of anything written on the topic of implementing a C preprocessor?
Hartmut Kaiser, the author of Boost Wave, wrote a nice article on CodeProject http://www.codeproject.com/KB/recipes/wave_preprocessor.aspx about the Boost Wave project. You can use Boost Wave to make your own C preprocessor with custom extensions.
I found a useful discussion in the document mcpp-summary at http://mcpp.sourceforge.net/
I've based mine on the gnu internals

What is the simplest parsing algorithm that can parse C code?

Does anyone know what the weakest family of widely-used parsing algorithms is that can parse C code? That is, is the C grammar LL(1), LR(0), LALR(1), etc.? I'm curious because as a side project I'm interested in writing a parser generator for one of these families and would like to ultimately be able to parse C code for another side project.
It seems that Bison uses an LALR(1) parser. LALR parsers are more robust than LL parsers, but are also more complex. From this I suspect that LALR(1) is probably the weakest parsing algorithm which can parse C code.
Unless you're really set on rolling your own recognizer. ANTLR would probably be your best bet to do this. ANTLR uses an LL* algorithm (which is, effectively, LALR).

Recursive Descent Parser for C

I'm looking for a parser for C. Here is what I need:
Written in C (not C++).
Handwritten (not generated).
BSD or similarly permissive license.
Capable of nontrivially parsing itself (can be a subset of C).
It can be part of a project as long as it's decoupled so that I can pull out the parser.
Is there an existing parser that fulfills these requirements?
If you don't need C99, then lcc is a slam dunk:
It is documented in a very clear, well-written book.
Techniques used for recursive-descent parsing of operators with precedence are well documented in an article and technical report by Dave Hanson.
Clear, handwritten ANSI C code.
One potential downside is that the lcc parser does not build an abstract-syntax treeā€”it goes straight from parsing to intermediate code.
If you must have C99 then I think tinycc (tcc) is your best bet.
How about Sparse?
You could try TCC. It's licensed under the Lesser GPL.
It seems that nwcc sufficiently agrees with your requirements.
Good c compiler is present at this location. Simple and accessible.
https://github.com/rui314/8cc
GCC has one in gcc/c-parser.c.
Check elsa, it uses the Generalized LR algorithm.
Its main use is for C++, but it also parses C code.
Check on its page, on the section called "How much C can Elsa parse?" which says it can parse most C programs, including the Linux kernel.
It's released under a BSD license.
Here is a recursive descent parser I ported to C:
http://www.gabotronics.com/resources/recursive-descent-parser.htm

Resources