Ocaml - Files and parsing - file

How to read contents from file in ocaml? Specifically how to parse them?
Example :
Suppose file contains (a,b,c);(b,c,d)| (a,b,c,d);(b,c,d,e)|
then after reading this, I want two lists containing l1 = [(a,b,c);(b,c,d)] and l2 = [(a,b,c,d);(b,c,d,e)]
Is there any good tutorial for parsing?

This is a good use case for the menhir parser generator (successor to ocamlyacc). You might want to use ocamllex for lexing. All have good documentation.
You could also use camlp4 or camlp5 stream parsing abilities.
Read also the wikipedia pages on lexing & parsing.

I'd be inclined to use Aurochs, a PEG parser for something like this. There is example code in the repo there.

If you want to specify a grammar and have ocaml generate lexers and parsers for you, check out these ocamllex and ocamlyacc tutorials. I recommend doing it this way. If you really only have one type of token in your file format, then ocamlyacc might be overkill if you can just use the lexer to split the file up into tokens that are considered valid by the grammar.

Related

C Language. How to use a string value as delimiter in SSCANF

Is there a way to use a string as a delimiter?
We can use characters as delimiters using sscanf();
Example
I have
char url[]="username=jack&pwd=jack123&email=jack#example.com"
i can use.
char username[100],pwd[100],email[100];
sscanf(url, "username=%[^&]&pwd=%[^&]&email=%[^\n]", username,pwd,email);
it works fine for this string. but for
url="username=jack&jill&pwd=jack&123&email=jack#example.com"
it cant be used...its to remove SQL injection...but i want learn a trick to use
&pwd,&email as delimiters..not necessarily with sscanf.
Update: Solution doesnt necessarily need to be in C language. I only want to know of a way to use string as a delimiter
Just code your own parsing. In many cases, representing in memory the AST you have parsed is useful. But do specify and document your input language (perhaps using EBNF notation).
Your input language (which you have not defined in your question) seems to be similar to the MIME type application/x-www-form-urlencoded used in HTTP POST requests. So you might look, at least for inspiration, into the source code of free software libraries related to HTTP server processing (like libonion) and HTTP client processing (like libcurl).
You could read an entire line with getline (or perhaps fgets) then parse it appropriately. sscanf with %n, or strtok might be useful, but you can also parse the line "manually" (consider using e.g. your recursive descent parser). You might use strchr or strstr also.
BTW, in many cases, using common textual representations like JSON, YAML, XML can be helpful, and you can easily find many libraries to handle them.
Notice also that strings can be processed as FILE* by using fmemopen and/or open_memstream.
You could use parser generators such as bison (with flex).
In some cases, regular expressions could be useful. See regcomp and friends.
So what you want to achieve is quite easy to do and standard practice. But you need more that just sscanf and you may want to combine several things.
Many external libraries (e.g. glib from GTK) provide some parsing. And you should care about UTF-8 (today, you have UTF-8 everywhere).
On Linux, if permitted to do so, you might use GNU readline instead of getline when you want interactive input (with editing abilities and autocompletion). Then take inspiration from the source code of GNU bash (or of RefPerSys, if interested by C++).
If you are unfamiliar with usual parsing techniques, read a good book such as the Dragon Book. Most large programs deal somewhere with parsing, so you need to know how that can be done.

Tips for parsing an iCal file

I'm trying to parse an iCal input file according to RFC 5545.
Specifically:
-Property name
-Optional parameters, each starting with semicolon ";" and possibly having multiple comma-separated values (parameter values may be double-quoted in which case they could contain colons, semicolons, and commas)
-Colon ":"
-Property value
Example line:
> ORGANIZER;CN=Obi-WanKenobi;SENTBY="mailto:obiwan#padawan.com":mailto:laowaion#padawan.com
in this case the line would be read into a buffer and parsed (using strtok currently) like this:
Organizer is the property name;
CN=Obi-WanKenobi and SENTBY="mailto:obiwan#padawan.com" are parameters; mailto:lauwaion#padawan.com is the property value.
I have no idea where to start. The different input cases are almost infinite and I haven't been able to figure out an effective algorithm to cover all of said cases. Is strtok the way to go? or is there another C library that has a more intelligent parser? Need someone to put me on the right track.
I'd suggest that you start with looking at existing C implementation:
in C: libical
in C#: dday.ical
Above answers are addressing your immediate question but you might hit other issues as you progress through the RFC5545 standard and looking at what others have done may be helpful
You can use flex(a GNU clone of lex) to write a lexical analyser that is tailored to your task. Ragel is another good tool for this problem.

Json string parser using C

I was referring a site called "joys of programming" for JSON Parser in C. The site seems down and I am not able to get information regarding JSON parser. It would be great if some one can guide me. I want to know how to create a JSON Array.Thanks in advance.
If you want to make you own json parser, you have to look at the language grammar, which is probably LL. Writing such a LL parser is almost trivial and kind of funny, use a regex library to save a precious time.
If you're looking for a library to deal with Json data, here is the second result Google gave me.
I found several lib could do this work.
Jsoncpp, JsonValue, cppCMS, JsonSpirit and Jansson. The jsonvalue is the easiest one. It just contains a pair of .h file and .cpp file.

File format to store certain configurations

I would like to know which file format i can use to store(and easily parse and read) certain configuration items and their values. On eoption is INI file. Is there any other option like .opt file?
EDIT:
I am using C language.
Look into XML. It's got implementations in many languages and is pretty easy to parse and create.
But a lot of it has to do with what language you're using.
http://www.w3schools.com/xml/default.asp
Personally, I like to use XML files for configuration options. Most non-power users can understand them relatively easy enough and there are many libraries out there that make them super easy to parse.

Program for documenting a C struct?

If you have a binary file format (or packet format) which is described as a C structure, are there any programs which will parse the structure and turn it into neat documentation on your protocol?
The struct would of course contain arrays, other structures, etc., as necessary to describe the format. The documentation would probably need to include things like packing, endianness, etc.
Maybe you should think about this a different way.
"Can I create a documentation format for my packet for which I can generate a C struct?"
Consider for example using XML to define the packet structure and add elements for comments and so forth. It wil be fairly easy to write a simple program that transformed it into an actual C structure
Doxygen is a commonly-used documentation generator. However, if you want to get useful documentation, you'll probably have to mark up your structure definitions with doc comments.
If you know perl you can try playing with Jeeves:
https://www.rodhughes.com/perl/advprog/examples/Jeeves/
(This source is there; I assume it's all right to use. ;) )
I'm trying to work out something similar to what you need: a parser for structured binary data. I'm looking to Jeeves to output parsing classes in C++ from a meta format. The default parser for Jeeves allows for adding additional tags to each member of a class definition. This would let you automatically include information about endianness, alignment, etc. in comments within your classes (and, of course, implement them in your code).

Resources