Parsing XML in Pure C - c

What is the preferred library for parsing XML data in Pure C?

The canonical XML parsing library for C is libxml2.

Two popular choices are expat and libxml2.

Here is a list of libraries for multiple languages, including C:
http://www.xml.com/pub/rg/XML_Parsers

Not 'the preferred library', but there's also http://www.minixml.org/.
Mini-XML is a small XML library that
you can use to read and write XML and
XML-like data files in your
application without requiring large
non-standard libraries. Mini-XML only
requires an ANSI C compatible compiler
(GCC works, as do most vendors' ANSI C
compilers) and a 'make' program.
Mini-XML supports reading of UTF-8 and
UTF-16 and writing of UTF-8 encoded
XML files and strings. Data is stored
in a linked-list tree structure,
preserving the XML data hierarchy, and
arbitrary element names, attributes,
and attribute values are supported
with no preset limits, just available
memory.

VTD-XML is the one you should look into, if you want a combination of ease of use, performane and efficiency

You can consider miniML-Parser, a simple and tiny XML parser library in C. It is specifically developed for embedded applications in mind.
It is extremely easy to use: You need to call only one API to parse your XML data
It has a very small footprint: The parser uses only 1.8 kB1 of code memory. Hence, you can use it in very small embedded applications.
It is a validating XML parser.
It also extracts the content of XML data and converts it to its specified data type.
It comes with a tool to generate the source code from XML schema file, instead of manually writing XML tree structure in C.
Disclosure: I'm author of this miniML-Parser

Related

Converting protocol buffers to/from JSON in C, without generating C code

I need to use .desc files to enable the reading of serialized protocol-buffer messages and their conversion to JSON (using jansson).
This is because the protocol-buffer message formats will change much more frequently than the C code. The .desc files will be a runtime input to the executable.
I've found https://github.com/Sannis/protobuf2json-c but my reading of this is that it needs C code to be generated. In particular the ProtobufCMessage needs to exist for the message being decoded, and I cannot see a way of making a ProtobufCMessage (from /usr/include/google/protobuf-c/protobuf-c.h) without generating C code.
Have I missed something here, or will I need to write new code?
I'm not familiar with the .desc extension but I'm guessing from the context that it is a file containing a protobuf FileDescriptorProto, defined in google/protobuf/descriptor.proto.
To do what you want, you will most likely need to use the Protobuf C++ or Java library, each of which defines a class DynamicMessage which has the ability to emulate arbitrary message types based on descriptors. You can then combine this with any Protobuf-JSON library that is based on the standard Protobuf reflection interfaces. (You can also write your own JSON converter pretty easily; use the TextFormat class (found in both the C++ and Java Protobuf libs) as a template.)
My understanding is that protobuf-c does not currently contain an equivalent to DynamicMessage.

We have C structures in header files and we want to have an XML schema generated from the header files

I have a twenty year old legacy application and want to connect it to a web front end. I need to pass a rather large deeply nested data structure that is defined in C structs. We are currently planing to do that in XML. The total number of struct definitions is around 150. These all nest into one huge data structure. I would like to find a program that would scan the header files and generate an XML Schema that I could then tailor to my needs. Does anyone know of such a tool?
SWIG (swig.org) has an XML target (-xml) that may do what you want.
There exist a tool called GCC XML which transform the internal representation of a program compiled by GCC into some XML, but it is not maintained any more.
A possibility could be to use GCC 4.6 plugin abilities, that it to code a plugin (in C) for GCC which would process the Tree (that is the internal AST) of the structure declaration. You can also use GCC MELT, a higher-level domain specific language to extend GCC. But in either cases, you'll need to understand the Tree (& Gimple) internal representations of GCC (and it might not worth it if you have just 150 structures). However, if your legacy application is large enough, learning these (and using MELT) might be worthwhile, because such new skills (of extending GCC) can be used for other tasks on that legacy application.
At last, you might also look into the (rather small, by today's standards) tools related to RPC-XDR, they contains a parser of C-like struct declarations.

File format to store certain configurations

I would like to know which file format i can use to store(and easily parse and read) certain configuration items and their values. On eoption is INI file. Is there any other option like .opt file?
EDIT:
I am using C language.
Look into XML. It's got implementations in many languages and is pretty easy to parse and create.
But a lot of it has to do with what language you're using.
http://www.w3schools.com/xml/default.asp
Personally, I like to use XML files for configuration options. Most non-power users can understand them relatively easy enough and there are many libraries out there that make them super easy to parse.

Parsing C header files to extract information about data types, functions and function arguments

I have a C header file. I want to parse it and extract information about data types, functions and functions arguments. Who can help me? I need some example in C.
Thank you very much.
You could try Clang. In special The Lexer and Preprocessor Library.
Use ANTLR. There's a decent grammar for C already written for you, and ANTLR will generate C code (or some other languages if you prefer), which you can then traverse to get what you want.
There is also srcml.
Similar to c2xml it uses source code directly.
c2xml starts from preprocessor output.
Assume good C coding rules (as opposed to arbitrary use of preprocessing) this has been an advantage for my re-engineering tasks, as it preserves the names of #defines and being able to process selected macros in a specific way.
The DMS Software Reengineering Toolkit with its C Front End can do this.
DMS provides general purpose parsing, symbol table construction, flow analysis, and program transformations, parameterized by a language definition. Using DMS's C front end, DMS will parse any of a variety of C dialects, builds ASTs for the code elements, builds full symbol tables doing complete name and type resolution of all symbols (including parameter lists in function headers); you can stop there and dump those out. DMS can also do control and data flow analysis on the C code; you can use othe DMS facilities to further analyze or transform the code. (The C front end has a full C preprocessor built-in).
The EDG front end can also be used for parsing and symbol tables, but does not have the other capabilities of DMS.
Yet another option is to use the c2xml tool from "sparse". Its C parser isn't 100% standard-compliant (e.g. it won't parse K&R-style declarations), but for reasonably modern C code it works quite well.
If you need a human-readable output (e.g. in html or PDF), then you can use doxygene/doxywizard. In doxywizard "All entities" has to be selected.

XML -> C parser generator

I have a c program, that gets its settings from an XML file. Currently I'm using Xerces to traverse the data, but it's getting quite tedious to map each XML-value to a variable.
The same XML is also read by a Java program, which is much more convenient due to JAXB creating all the necessary classes and such in Java. I'm looking for something similar that can create a "structure of structs" or some such. It's important that I get c structs, and not c++ classes, because this code will run on GPUs.
I found "XML Booster", and am currently reading it docs. Do you know of other options? Needs to be usable in linux.
i use the libxml library. You still have to traverse the XML, but you get a linked list with elements, attribues, nodes and children-nodes, which you can follow.
link: http://xmlsoft.org/index.html
Given your XML files have common pattern, you can use Bison+Flex or simply ANTLR (C runtime) to construct grammar and extract the values from the XML files to variables. Those will produce parsers in pure C so you have nothing to worry about.
If you have an xml schema, check out xsd codesynthesis. It generates nice c++ objects for your xsd and you don't need to deal with xerces directly:
http://www.codesynthesis.com/products/xsd/

Resources