C Library to read configuration files with syntax based on curly brackets - c

For my C projects I'd like to use curly brackets based configuration files like:
account {
name = "test#test.com";
password = "test";
autoconnect = true;
}
etc. or some variations.
I'm trying to find some nice C libraries to suit my needs. Can you please advise?

Your desired syntax is nearly identical to Lua, which would look like this:
account = {
name = "test#test.com",
password = "test",
autoconnect = true,
}
If that suits you, I highly recommend Lua, as it's designed to be embeddable in C programs as a configuration or scripting facility. You can either use the raw Lua C API, or if you prefer C++ there are things like Luabind to make certain things prettier in that language.
Here is a trivial example using the pure C Lua API to retrieve values from a buffer which contains a Lua "chunk": http://lua-users.org/wiki/GettingValuesFromLua . You can basically read (or mmap) your configuration file in C, pass the pointer to the text to Lua, have Lua execute it, and then retrieve the bits and pieces iteratively. An alternative is to do "binding" (for which there is also an example on the Lua wiki). With binding the flow is more like that you set up C structures to represent your configuration data, bind them to Lua, and let the Lua configuration script actually populate (construct) a configuration object which is then accessible from C. Depending on your exact needs this may be better or worse, but in pure C (as opposed to C++), the learning curve may be steeper than the "get values" approach.

I would suggest using a lexer and parser for doing this, either the lex/yacc combo or flex/bison.
You basically write code in a .l and .y file to describe the layout and the lexer/parser generator creates C code that will process the file for you, calling functions to deliver the data to you.
Lexical analysis and parsing are a pain to do unless you're well versed in the art. Tools like those I've mentioned make the job a lot easier.
In the lexer, you get it to recognise the lexical elements like
e_account (account)
e_openbrace ({)
e_name (name)
e_string ("[^"]*")
e_semicolon (;)
and so on.
The lexer is used by the parser to detect the lexical elements and the parser has the higher level rules for deciding what constructs are valid. Things like an account section being e_account, e_openbrace, zero or more of e_stanza then finally e_closebrace. And also detecting e_stanza as being (among others) e_name, e_equals, e_string then e_semicolon.
Most of the intelligence is under the covers (and pretty ugly looking code at least for lex/yacc) but it's better than trying to write it yourself :-)

A variant of what you described would be JSON:
account={
name: "test#test.com",
password: "test",
autoconnect: true
}
http://www.json.org/
lists ~100 libraries to read and write JSON for every conceivable platform and language. There are seven libraries alone for C. The nice thing for JSON is interoperability of course and having a data format which is widely accepted (it even has a RFC: rfc4627)

libconfuse has nearly the syntax you require:
/*
* This is a C-style multi-line comment
*/
BackLog = 2147483647
bookmark heimdal {
login = "anonymous"
password = ${ANONPASS:-anonymous#} # environment variable substitution
}

Related

C struct introspection at runtime

Is there a facility for the C language that allows run-time struct introspection?
The context is this:
I've got a daemon that responds to external events, and for each event we carry around an execution context struct (the "context"). The context is big and messy, and contains references to all sorts of state.
Once the event has been handled, I would like to be able to run the context through a filter, and if it matches some set of criteria, drop a log message to help with debugging. However, since I hope to use this for field debugging, I won't know what criteria will be useful to filter on until run time.
My ideal solution would allow the user to, essentially, write a C-style boolean expression and have the program use that. Something like:
activate_filter context.response_time > 4.2 && context.event.event_type == foo_event
Ideas that have been tossed around so far include:
Providing a limited set of fields that we know how to access.
Wrapping all the relevant structs in some sort of macro that generates introspection tools at run time.
Writing a python script that knows where (versioned) headers live, generates C code and compiles it to a dll, which the daemon then loads and uses as a filter. Obviously this approach has some extra security considerations.
Before I start in on some crazy design goose chase, does anyone know of examples of this sort of thing in the wild? I've dome some googling but haven't come up with much.
I would also suggest tackling this issue from another angle. The key words in your question are:
The context is big and messy
And that's where the issue is. Once you clean this up, you'll probably be able to come up with a clean logging facility.
Consider redefining all the fields in your context struct in some easy, pliable format, like XML. A simple `XML schema, that lists all the members of the struct, their types, and maybe some other metadata, even a comment that documents this field.
Then, throw together a quick and dirty stylesheet that reads the XML file and generates a compilable C struct, that your code actually uses. Then, a different stylesheet that cranks out robo-generated code that enumerates each field in the struct, and generates the code to convert each field into a string.
From that, bolting on a logging facility of some kind, with a user-provided filtering string becomes an easier task. You do have to come up with some way of parsing an arbitrary filtering string. Knowledge of lex and yacc would come in handy.
Things of this nature have been done before.
The XCB library is a C client library for the X11 protocol. The protocol defines various kinds of binary messages which are essentially simple structs that the client and the server toss to each other, over a socket. The way that libxcb is implemented, is that all X11 messages and all datatypes inside them are described in an XML definition, and a stylesheet robo-generates C struct definitions, and the code to parse them out, and provide a fairly clean C API to parse and generate X11 messages.
You are probably approaching this problem from a wrong side.
Logging is typically used to facilitate debugging. The program writes all sorts of events to a log file. To extract interesting entries filtering is applied to the log file.
Sometimes a program generates just too much events; logging libraries usually address this issues by offering verbosity control. Basically a logging function takes an additional parameter telling the verbosity level of the current message. If the value is above the globally configured threshold the message gets discarded. Some libraries even allow to control verbosity level on a per-module basis (Ex: google log).
Another possible approach is to leverage the power of a debugger since the debugger has access to all sorts of meta information. One can create a conditional breakpoint testing variables in scope for arbitrary conditions. Once the program stops any information could be extracted from the scope. This can be automated using scripting facilities provided by a debugger (gdb has great ones).
Finally there are tools generating glue code to use C libraries from scripting languages. One example is SWIG. It analyzes a header file and generates code allowing a scripting language to invoke functions, access structure fields, etc.
Your filter expression will become a program in, say, Lua (other scripting languages are supported as well). You invoke this program passing in the pointer to execution context struct (the "context"). Thanks to the accessors generated by SWIG Lua program can examine any field in the structure.
I generated introspection out of SWIG-CSV parser.
Suppose the C code contains structure like the following,
class Bike {
public:
int color; // color of the bike
int gearCount; // number of configurable gear
Bike() {
// bla bla
}
~Bike() {
// bla bla
}
void operate() {
// bla bla
}
};
Then it will generate the following CSV metadata,
Bike|color|int|variable|public|
Bike|gearCount|int|variable|public|
Bike|operate|void|function|public|f().
Now it is easy to parse the CSV file with python or C/C++ if needed.
import csv
with open('bike.csv', 'rb') as csvfile:
bike_metadata = csv.reader(csvfile, delimiter='|')
# do your thing

Turning strings into code?

So let's say I have a string containing some code in C, predictably read from a file that has other things in it besides normal C code. How would I turn this string into code usable by the program? Do I have to write an entire interpreter, or is there a library that already does this for me? The code in question may call subroutines that I declared in my actual C file, so one that only accounts for stock C commands may not work.
Whoo. With C this is actually pretty hard.
You've basically got a couple of options:
interpret the code
To do this, you'll hae to write an interpreter, and interpreting C is a fairly hard problem. There have been C interpreters available in the past, but I haven't read about one recently. In any case, unless you reallY really need this, writing your own interpreter is a big project.
Googling does show a couple of open-source (partial) C interpreters, like picoc
compile and dynamically load
If you can capture the code and wrap it so it makes a syntactically complete C source file, then you can compile it into a C dynamically loadable library: a DLL in Windows, or a .so in more variants of UNIX. Then you could load the result at runtime.
Now, what normally would lead someone to do this is a need to be able to express some complicated scripting functions. Have you considered the possibility of using a different language? Python, Scheme (guile) and Lua are easily available to add as a scripting language to a C application.
C has nothing of this nature. That's because C is compiled, and the compiler needs to do a lot of building of the code before the code starts running (hence receives a string as input) that it can't really change on the fly that easily. Compiled languages have a rigidity to them while interpreted languages have a flexibility.
You're thinking of Perl, Python PHP etc. and so called "fourth generation languages." I'm sure there's a technical term in c.s. for this flexibility, but C doesn't have it. You'll need to switch to one of these languages (and give up performance) if you have a task that requires this sort of string use much. Check out Perl's /e flag with regexes, for instance.
In C, you'll need to design your application so you don't need to do this. This is generally quite doable, as for its non-OO-ness and other deficiencies many huge, complex applications run on well-written C just fine.

Tool to produce self-referential programs?

Many results in computability theory (such as Kleene's second recursion theorem) ensure that it is possible to construct programs that can operate over their own source code. For example, in Michael Sipser's "Introduction to the Theory of Computation," he proves a special case of the Recursion Theorem, which states that any program representing a function that accepts two strings and produces a string can be converted into an equivalent program where the second argument is equal to the program's own source code. Moreover, this process can be done automatically.
The construction that one uses to produce programs with access to their own source code is well-known (most theory of computation books contain it) and is often used to generate quines. My question is whether someone has written a general-purpose tool that accepts as input a program in some language (perhaps C, for example) that contains some placeholder for the source of the program, then processes the program to produce a new program with access to its own source code. This would make it possible, for example, to generate quines automatically, or to write programs that can introspect on their syntax trees (possibly enabling reflection in languages that don't already support it). If not, I was planning on writing my own version of such a tool, but I don't want to reinvent the wheel if this has already been done.
EDIT: Based on #Henning Makholm's suggestion, I decided to just sit down and implement such a program. The resulting program (which I've dubbed "kleene") accepts as input a C++ program and produces a new C++ program that can access its own source code by calling the function kleene::MySource(). This means that you could transform this very simple program into a Quine using the kleene program:
#include <iostream>
int main() {
std::cout << kleene::MySource() << std::endl;
}
If you're curious to check it out, it's available here on my website.
Lots of examples at the Wikipedia article and links therefrom. After looking at one or two it should be obvious how to build a quine generator a given language that takes an arbitrary piece of payload code as input.
One problem with your reflection idea is that the program cannot, in general, know that what it has constructed is its own source code.
Our DMS Software Reengineering Toolkit is a program transformation system, that will accept programs in arbitrary syntax (described to DMS in an explicit parameter called a "domain description"), parse them to ASTs, carry out analyses and transformations of the ASTs, and can regenerate revised program text from the modified version.
DMS is of course coded in a language (actually as set of domain-specific languages) for which there are already DMS-domain descriptions. So, DMS can read itself, and we use that capability to bootstrap additional DMS capabilities and optimize its performance.
So while we aren't producing quines, we are building programs with self-enhancing code.
And yes, your observation about such a tool providing reflection for arbitrary langauges is smack on. Most reflection facilities provided in languages allow only access to those things the language-compiler folks thought of paramount importance to access at runtime, such as "method names". Things they weren't interested in, of course, aren't accessible; ever seen a reflection mechanism that will tell you what's in an expression? In a comment?
DMS provides complete access to all the details of the source code, by virtue of inspecting the code from outside, using general purpose, complete mechanisms. If your language doesn't have reflection, DMS is the way to access the code and reason arbitrarily about it. Even if your langauge has reflection, DMS can reason about programs in your language in ways that your language cannot, because it can't get access to its own detailed structure.

Parsing C header files to extract information about data types, functions and function arguments

I have a C header file. I want to parse it and extract information about data types, functions and functions arguments. Who can help me? I need some example in C.
Thank you very much.
You could try Clang. In special The Lexer and Preprocessor Library.
Use ANTLR. There's a decent grammar for C already written for you, and ANTLR will generate C code (or some other languages if you prefer), which you can then traverse to get what you want.
There is also srcml.
Similar to c2xml it uses source code directly.
c2xml starts from preprocessor output.
Assume good C coding rules (as opposed to arbitrary use of preprocessing) this has been an advantage for my re-engineering tasks, as it preserves the names of #defines and being able to process selected macros in a specific way.
The DMS Software Reengineering Toolkit with its C Front End can do this.
DMS provides general purpose parsing, symbol table construction, flow analysis, and program transformations, parameterized by a language definition. Using DMS's C front end, DMS will parse any of a variety of C dialects, builds ASTs for the code elements, builds full symbol tables doing complete name and type resolution of all symbols (including parameter lists in function headers); you can stop there and dump those out. DMS can also do control and data flow analysis on the C code; you can use othe DMS facilities to further analyze or transform the code. (The C front end has a full C preprocessor built-in).
The EDG front end can also be used for parsing and symbol tables, but does not have the other capabilities of DMS.
Yet another option is to use the c2xml tool from "sparse". Its C parser isn't 100% standard-compliant (e.g. it won't parse K&R-style declarations), but for reasonably modern C code it works quite well.
If you need a human-readable output (e.g. in html or PDF), then you can use doxygene/doxywizard. In doxywizard "All entities" has to be selected.

How to write own Configformat

I've developed an own file format for configuration files (plaintext and line based -> EOL = one configuration) for an application. This format is nothing quit special and the only reason I do this, is to learn something! The reader and writer functions will be implemented in C (with GLib because it should be a UTF8 encoded file).
So now, I'm thinking about the way I implement this format in C code. Which steps I have to do to get error messages that are as good as possible. I've heard something about Lexer, Parser, ... but never gone too deep in it. I’ve only a very abstract idea of them. So which steps I need to do to get a clean reader written in C for the format, which is also maintainable for future changes? What are the topics to learn/think about?
And yes I know: C is pain, there are a lot of diffrent "sexy" formats for this propose and so on. I want to learn something!
Cheers,
Gregor
Additional information
The reader/writer/parser (or whatever it's called) should depend on as little as possible on third party programs/components. The application around this config part already uses GLib, so that's whay GLib is also used for UTF8
One cool way of creating a config format is to embed a scripting language.
This gives you the parser for free and gives you the possibility to generate data on the fly or define variables that are being reused:
Consider these examples of xml vs an ugly pseudo scripting language:
<InputPoints>
<Point>
<x>1.0</x>
<y>1.0</y>
</Point>
<Point>
<x>1.0</x>
<y>2.0</y>
</Point>
<Point>
<x>1.0</x>
<y>3.0</y>
</Point>
<Point>
<x>1.0</x>
<y>4.0</y>
</Point>
<InputPoint>
vs:
for(i = 1; i <= 4; ++i) {
InputPoint(1, i);
}
or perhaps
<Username>allanballan</Username>
<Accountname>allanballan</Accountname>
<HomeDirectory>/home/allanballan</HomeDirectory>
vs
user = "allanballan";
Username = user;
Accountname = user;
HomeDirectory = "/home/"+user;
The first example compresses a list of points to a few statements, the second examples shows how to remove lots of redundant data using a temporary variable.
A popular language for this kind of situation is Lua. Exactly how to map a scripting language to configuration is up to the integrator, but it's really powerful and it comes with parsing and type checking for free.
You might want to look at the libconfig source code. It has a lightweight parser you could use as a starting point and that will probably help you in figuring out what a parser for your own format would have to look like.
Though, if you really want to learn about parsers and lexers, it would probably be better to implement a simple compiler. There's an MIT course you could follow.
Depending on how deep you'd like to dive into learning the matter, you should think about not writing your parser manually. You can do so of course, but it will be a great deal more complicated and adding new features to your language will burden you with the problems of always adapting lexer and parser code.
The good thing is, there are lots of tools out there that enable you to generate this stuff from a high-level description of your input and its structure. Standard *nix tools to do so are Lex and Yacc (or their descendants Flex and Bison), but I'd like to point you to ANTLR (http://www.antlr.org) instead. One of its nice features is that it provides backends for many different languages (C/C++ as well as Java, Python, Ruby, C#, ...), so learning how to work with it will also help you if you want to switch languages at a later point.

Resources