File format to store certain configurations - file

I would like to know which file format i can use to store(and easily parse and read) certain configuration items and their values. On eoption is INI file. Is there any other option like .opt file?
EDIT:
I am using C language.

Look into XML. It's got implementations in many languages and is pretty easy to parse and create.
But a lot of it has to do with what language you're using.
http://www.w3schools.com/xml/default.asp

Personally, I like to use XML files for configuration options. Most non-power users can understand them relatively easy enough and there are many libraries out there that make them super easy to parse.

Related

Ocaml - Files and parsing

How to read contents from file in ocaml? Specifically how to parse them?
Example :
Suppose file contains (a,b,c);(b,c,d)| (a,b,c,d);(b,c,d,e)|
then after reading this, I want two lists containing l1 = [(a,b,c);(b,c,d)] and l2 = [(a,b,c,d);(b,c,d,e)]
Is there any good tutorial for parsing?
This is a good use case for the menhir parser generator (successor to ocamlyacc). You might want to use ocamllex for lexing. All have good documentation.
You could also use camlp4 or camlp5 stream parsing abilities.
Read also the wikipedia pages on lexing & parsing.
I'd be inclined to use Aurochs, a PEG parser for something like this. There is example code in the repo there.
If you want to specify a grammar and have ocaml generate lexers and parsers for you, check out these ocamllex and ocamlyacc tutorials. I recommend doing it this way. If you really only have one type of token in your file format, then ocamlyacc might be overkill if you can just use the lexer to split the file up into tokens that are considered valid by the grammar.

Json string parser using C

I was referring a site called "joys of programming" for JSON Parser in C. The site seems down and I am not able to get information regarding JSON parser. It would be great if some one can guide me. I want to know how to create a JSON Array.Thanks in advance.
If you want to make you own json parser, you have to look at the language grammar, which is probably LL. Writing such a LL parser is almost trivial and kind of funny, use a regex library to save a precious time.
If you're looking for a library to deal with Json data, here is the second result Google gave me.
I found several lib could do this work.
Jsoncpp, JsonValue, cppCMS, JsonSpirit and Jansson. The jsonvalue is the easiest one. It just contains a pair of .h file and .cpp file.

XML -> C parser generator

I have a c program, that gets its settings from an XML file. Currently I'm using Xerces to traverse the data, but it's getting quite tedious to map each XML-value to a variable.
The same XML is also read by a Java program, which is much more convenient due to JAXB creating all the necessary classes and such in Java. I'm looking for something similar that can create a "structure of structs" or some such. It's important that I get c structs, and not c++ classes, because this code will run on GPUs.
I found "XML Booster", and am currently reading it docs. Do you know of other options? Needs to be usable in linux.
i use the libxml library. You still have to traverse the XML, but you get a linked list with elements, attribues, nodes and children-nodes, which you can follow.
link: http://xmlsoft.org/index.html
Given your XML files have common pattern, you can use Bison+Flex or simply ANTLR (C runtime) to construct grammar and extract the values from the XML files to variables. Those will produce parsers in pure C so you have nothing to worry about.
If you have an xml schema, check out xsd codesynthesis. It generates nice c++ objects for your xsd and you don't need to deal with xerces directly:
http://www.codesynthesis.com/products/xsd/

How to write own Configformat

I've developed an own file format for configuration files (plaintext and line based -> EOL = one configuration) for an application. This format is nothing quit special and the only reason I do this, is to learn something! The reader and writer functions will be implemented in C (with GLib because it should be a UTF8 encoded file).
So now, I'm thinking about the way I implement this format in C code. Which steps I have to do to get error messages that are as good as possible. I've heard something about Lexer, Parser, ... but never gone too deep in it. I’ve only a very abstract idea of them. So which steps I need to do to get a clean reader written in C for the format, which is also maintainable for future changes? What are the topics to learn/think about?
And yes I know: C is pain, there are a lot of diffrent "sexy" formats for this propose and so on. I want to learn something!
Cheers,
Gregor
Additional information
The reader/writer/parser (or whatever it's called) should depend on as little as possible on third party programs/components. The application around this config part already uses GLib, so that's whay GLib is also used for UTF8
One cool way of creating a config format is to embed a scripting language.
This gives you the parser for free and gives you the possibility to generate data on the fly or define variables that are being reused:
Consider these examples of xml vs an ugly pseudo scripting language:
<InputPoints>
<Point>
<x>1.0</x>
<y>1.0</y>
</Point>
<Point>
<x>1.0</x>
<y>2.0</y>
</Point>
<Point>
<x>1.0</x>
<y>3.0</y>
</Point>
<Point>
<x>1.0</x>
<y>4.0</y>
</Point>
<InputPoint>
vs:
for(i = 1; i <= 4; ++i) {
InputPoint(1, i);
}
or perhaps
<Username>allanballan</Username>
<Accountname>allanballan</Accountname>
<HomeDirectory>/home/allanballan</HomeDirectory>
vs
user = "allanballan";
Username = user;
Accountname = user;
HomeDirectory = "/home/"+user;
The first example compresses a list of points to a few statements, the second examples shows how to remove lots of redundant data using a temporary variable.
A popular language for this kind of situation is Lua. Exactly how to map a scripting language to configuration is up to the integrator, but it's really powerful and it comes with parsing and type checking for free.
You might want to look at the libconfig source code. It has a lightweight parser you could use as a starting point and that will probably help you in figuring out what a parser for your own format would have to look like.
Though, if you really want to learn about parsers and lexers, it would probably be better to implement a simple compiler. There's an MIT course you could follow.
Depending on how deep you'd like to dive into learning the matter, you should think about not writing your parser manually. You can do so of course, but it will be a great deal more complicated and adding new features to your language will burden you with the problems of always adapting lexer and parser code.
The good thing is, there are lots of tools out there that enable you to generate this stuff from a high-level description of your input and its structure. Standard *nix tools to do so are Lex and Yacc (or their descendants Flex and Bison), but I'd like to point you to ANTLR (http://www.antlr.org) instead. One of its nice features is that it provides backends for many different languages (C/C++ as well as Java, Python, Ruby, C#, ...), so learning how to work with it will also help you if you want to switch languages at a later point.

Are there any naming conventions when creating your own file suffix?

I'm working on a little game and figured I'd pack images and other resources into my own files to keep the directories neat. Then I thought, is there any convention to what I should call my file, or can I simply call it what ever without anyone ever caring? There's probably not a lot of strong opinions about this rather innocent subject, but I thought I'd better have some kind of response on it.
The most obvious would be to not use reserved characters.
< > : " / \ | ? *
Those are the ones for windows. Anyone care to add what characters are reserved on other systems?
There's some standard suffixes that I'm guessing shouldn't be used, unless the file actually apply to the suffix's standard.
.bat .exe .dll .lib .cmd
And then there's all the image file types and what not, but those are about as obvious. What more?
What is your opinion? Should I name my suffix as uniquely as possible, say .maf (My Awesome File) or what ever... or should I be more informative and stick to a known suffix that might reveal what my file is actually doing there? Or perhaps a bland old .dat or .bin?
If you want to create something that you want to associate with your program, you do, of course, want it to be as unique as possible. When you come up with an extension, check with FILExt to see if it's conflicting with anything major.
If you just want to convey "this is a binary file, don't try to open it in notepad or tamper with it", i'd go with something like .bin, yes.
Unix platforms don't have filename restrictions (other than NULL and forward slash), so don't worry about any characters other than what Windows doesn't like.
You can worry about using an extension that hasn't been used before, but unless you want to use a really long one, I'd say don't bother, you can always go with something generic like .dat or .bin. You don't actually need to even use an extension, which (imo) is just as good, unless you will be distributing some of these files other than with the program (for example, user made maps, you will want to have an extension since users will be distributing the files).
Another minor point you might want to consider is that MS DOS extensions need to be exactly three characters after the dot. Being DOS compatible isn't a huge issue (not an issue at all really), but that's why you'll see a lot of extensions are three characters.
Use what makes sense to you - I would avoid well known extensions, as you have proposed to avoid them being accidentaly opened by another application.
Most applications/games will give an extension that is related to the application name or use (.doc, .psd etc...).
Unless users are going to double-click on the files from explorer, having a nice informative, unique extension is not important, so you might want to go with .bin or .dat. However there exist good mechanisms for packing files together (.zip or .7z) so you might want to go for a standard packer, with a standard extension.

Resources