Lemon power or not? - c

For grammar parser, I used to "play" with Bison which have its pros/cons.
Last week, I noticed on SqLite site that the engine is done with another grammar parser: Lemon
Sounds great after reading the thin documentation.
Do you have some feedback about this parser?
Cannot really see pertinent information on Google and Wikipedia (just a few examples, same tutorials) It doesn't seem very popular. (there is no lemon tag in Stack Overflow [ed: there is now :P])

Reasons we are using Lemon in our firmware project are:
Small size of generated code and memory footprint. It produces the smallest parser I found (I compared parsers of similar complexity generated by flex, bison, ANTLR, and Lemon);
Excellent support of embedded systems: Lemon doesn't depend on standard library, you can specify external memory management functions, debug logging is removable.
Public domain license. There is separate fork of Lemon licensed under GPLv2 that is not suitable for our needs because of viral license. So we get latest sqlite sources and compile Lemon out of them (it consists of only two files);
Pull-parsing. It makes code more straightforward to understand and maintain than Flex/Bison parsing code. Thread-safety as an additional bonus I admire.
Simple integration with tokenizers. Our project nature requires tokenizing of binary stream with variable tokens size. It was quite an easy to implemented tokenizer and integrate with parser API of only 3 functions and one feedback context variable. We investigated ways of integrating Lemon with re2c and Ragel and found them also quite easy to implement.
Very simple syntax fast to learn.
Lemon explicitly separate development of tokenizer and lexical analyzer(parser). My development flow starts with designing of parser grammar. I'm able to check complex rules with implicit token sequence by the means of several Parser(...) calls at this first stage. Tokenizer is implemented afterwards.
Surely Lemon is not a silver bullet, it has limited area of application. Among disadvantages:
Lemon requires to write more rules in comparison with Bison because of simplified syntax: no repetitions and optionals, one action per rule, etc.
Complete set of LALR(1) parser limitations.
Only the C language.
Weigh the pros and cons before making your choice. I've done mine ;-)

Interesting find! I haven't actually used it, so the commentary is based on reading the documentation.
The redesign so that the lexical analysis is done separately from the parsing immediately seems to have merit. In particular, it has the potential to simplify operations such as handling multiple or nested source files. The Lex-based yywrap() mechanism is less than ideal. That it avoids all global variables and has careful memory allocation and deallocation control should count in its favour (that it allows the choice of allocator and deallocator greatly helps too - at least for the environments where I work, where memory allocation is always an issue).
The rethinking on how the rules are organized and how the terminals are identified is a good idea.
All in all, it looks like a well thought out redesign of Bison.
It is in the public domain according to the referenced web pages.


MISRA-C coding guidelines for personal use programs?

I am usually a wood worker and not a developer. I'm learning C/C++ for embedded systems while trying to make some of my tool autonomous to save me hours of repetitive work.
For now, its fun and going well, I have spend maybe a hundred of hours coding/learning and already saved more time*.
As I want to keep going is buying and following MISRA coding rule a "mandatory good idea"? What does MISRA contain? Only coding rules, or kind of tips to make it safer?
Those tools could be dangerous (after all they cut wood and a human body is far less resistant...).
Note: I obviously do my test in 4 steps:
Just the pic running with an OSD & SD card logger (one day I'll make an anylze tool and stop reading those).
I plug the tool with nothing on it
I use soft drill/cutters on foam
I conduct real test at good distance with my hand on the emergency stop button.
Also I'm the only employee and no one else has access to my work-place.
*for now I've turn a drill into a kinda 3D wood printer (doing the not precise part of the work), and a "cutter-board" into an automated one.
Note2: I'm not a native speaker so tools' names are probably off.
MISRA is designed originally for use in the automotive industry, though it has grown well past that at this stage. The MISRA guidlines stated aims are:
Ensure safety
Bring in robustness, reliability to the software.
Human safety must take precedence when in conflict with security of property.
Consider both random and systematic faults in system design.
Demonstrate robustness, not just rely on the absence of failures.
Application of safety considerations across the design, manufacture, operation, servicing and disposal of products.
The documents mainly consist of rule based advisory information for code that tries to meet these aims. MISRA document prices have dropped somewhat over the years, some documents can be bought online from MISRA for as little as GBP £10 + VAT.
However, as a beginner and amateur coder, I would advise first bolstering your knowledge of C and C++. While in most areas of industry it is often good to follow a pertinent standard, if applicable, the documents are written with the assumption that the reader has a very solid grounding in the languages and also in the concerns and processes governing full-scale commercial type applications written in them. If your workshop is for personal use only, and depending on rules governing workplace safety in your jurisdiction, I can say that having a good understanding of the languages, language tools and the hardware would allow you to start making good choices with regards to how to code things more-so than reading MISRA could at such a stage in.
As commented above, and it is worth reiterating, MISRA is not some kind of magic wand or concrete way of going about things that will guarantee your code is good, works and is safe. Both good and bad code can meet standards. Following MISRA before having a good and complete grasp of what you are doing might be the same as ensuring every cable in your work shop is neatly tacked in place but then stabbing yourself with a chisel.
MISRA-C is a set of rules which will enforce you to weed out well-known problems and poorly-defined behavior from a C program. It is a "safe subset" of the C language, banning various forms of dangerous practice through rules aiming for well-known bugs such as reliance on poorly-defined behavior or implicit type conversions. C has the advantage of being a very old language, meaning that all the language flaws are well-known.
MISRA-C has a heavy focus on static code analysis to find bugs at compile-time. This is something to keep in mind, as to my knowledge there exists no open source static code analyser tools that can check for MISRA-C compliance. The commercial tools tend to be very expensive and often also full of bugs/false positives. Still, most of them are useful.
MISRA-C is only focused on C programming, it does not address CPU or microcontroller issues etc, although it does enforce some forms of defensive programming, which is a counter against EMI, run-away code and other forms of unexpected program behavior. (For a list of general tips & tricks beyond C, see this. Not all of these will necessarily apply to your specific machine though.)
To demonstrate MISRA compliance, you create a "compliance matrix" which shows how you catch every directive/rule of the MISRA-C document: through compiler messages, peer review, static code analysis etc.
Most rules in the document make a lot of sense, but some do not. MISRA-C does however allow deviations from most rules, ranking them as one of:
Mandatory. No deviations allowed.
Recommended. One must invoke a formal deviation procedure if not following the rule.
Advisory. One can deviate from the rule without making a formal deviation.
Typically, creating MISRA-C compliance is therefore done by establishing a company coding standard, which addresses all rules. The easiest way to implement it is write down in this document which rules that are followed and which ones that are skipped, on a company level. Then set static code analysis filters accordingly.

NLP libraries for simple POS tagging

I'm a student who's working on a summer project in NLP. I'm fairly new to the field, so I apologize if there's a really obvious solution. The project is in C, both due to my familiarity with it, and the computationally intensive nature of the project (my corpus is a plaintext dump of wikipedia).
I'm working on an approach to relationship extraction, exploiting the consistency principle to try to learn (to within some error threshold) a set of rules dictating which clusters of grammar objects imply a connection between those objects.
One of the first steps in the algorithm involves finding the set of all possible grammar objects a given word can refer to (POS disambiguation is done implicitly by the algorithm at a later step). I've looked at several parsers, but they all seem to do the disambiguation step themselves, which (from my end) is counterproductive. I'm looking for something off the shelf that (ideally) gives me a one-command way to turn up this information.
Does such a thing exist? If not, is there an existent dictionary containing this information that's trivially machine parseable?
Thank you for your help.
Look at CMU Sphinx. An open source NLP project. I think its in C++ but you can integrate it or at least get the idea of how to go about things.
What about calling an external POS tagger as a shell script or wrapping it in an http service if you feel frisky?
Java and Python have the vast majority of NLP libraries so it makes sense to take advantage of that. If you can use NLTK in a script to tag stuff, call this script from C, that makes it much easier.

Interesting examples of Domain Specific Languages

I'm considering doing something with Domain Specific Languages for my undergraduate project. My one problem is I can't really find any interesting examples that I can root around in. Does anyone have any good examples of DSELs (preferably open source)?
Also, one area I would love to look at is solving/addressing concurrency problems (coroutines etc) with DSEL's. Are there any good examples that anyone uses of this in DSELs? If this is a stupid application of DSELs please explain why...
Another potential area to explore would database programming. Again is this a stupid area to explore with DSEL's. For example, would adding some crazy database manipulation syntax to C# say be a good project to undertake?
EDIT: General languages I would be looking at implementing in would be Java, Python, Scala, C# etc. Probably not C++ or C.
Linda implementations can be considered as eDSLs. STM implementations like CL-STM are certainly eDSLs.
Unrelated to concurrency, but extremely useful are embedded Prolog implementations, there are plenty of them for Scheme, Lisp and Clojure. Parsing eDSLs had been mentioned already - and their patriarch Parsec definitely worth digging into.
EDIT: with your list of implementation languages you're missing the most interesting eDSL opportunities. The most powerful and flexible eDSLs are made with metaprogramming. Scala-style (or even Haskell-style) eDSLs are based on high order functions, i.e., on mini-interpreters. They're more complicated in design, much less flexible and limited to the syntax of your host language.
boost::spirit if you're after C++ is an interesting example. Quote:
Spirit is a set of C++ libraries for
parsing and output generation
implemented as Domain Specific
Embedded Languages (DSEL)...
(I have no idea what you mean by "solving concurrency" though. I don't see how you can solve "concurrency problems" in general, or how a DSEL could help.)

How to generate sequence diagram for my Native (C, C++) code?

I would like to know how to generate a sequence diagram for my Native (C, C++) code. I have written my C code using vim editor.
First of all, sequence diagram is an object oriented concept. It is meant to convey, at a glance, message passing between objects in an object oriented program in a sequential fashion, which is supposed to help understand time-considerate interaction between the objects. As such, it does not make sense to talk about sequence diagrams in the context of a procedural language like C.
When it comes to C++, sequence diagrams are defined in the general sense by the UML specification, which is the same for all object oriented languages. UML is considered a higher-level concept from source code that looks the same for all languages, and the process of converting source code to UML is called code reverse engineering. There are tools that allow you to convert source code of Java, C++ and other languages into UML diagrams that show relationships between classes, like Enterprise Architect, Visual Paradigm and IBM Rational Software Architect.
A sequence diagram, however, is a special kind of a UML diagram and it turns out that reverse engineering a sequence diagram is quite challenging. First, if you wanted to generate a sequence diagram through static analysis, one of the first questions you must answer is whether, given two objects and a message passed between them, a result is ever returned. This means that, given a method, you would have to analyze its algorithm and figure out if it loops forever or it returns. This is known as the halting problem and has been proven to be undecidable in computer science. This means that in order to produce a sequence diagram through static analysis, you would have to sacrifice accuracy. Dynamic analysis works by actually running the code and mapping the interactions between the objects at run time. This presents its own challenges. First, you would have to instrument the code. Then, filtering out the interactions you are interested in from library and system calls and other fluff present in the code would not be doable without user intervention.
This is not to say that creating a tool that would produce usable sequence diagrams is not possible, but the market interest has apparently not been strong enough to justify the effort, and apart from a few research papers on the subject, like CPP2XMI, I'm not aware of any commercially available tools to reverse engineer C++ into sequence diagrams.
Compounding the problem is the fact that C++ is one of the most complex object oriented languages around, so even if somebody devised a good way of reverse engineering sequence diagrams, C++ would be the last language to receive the treatment. Case in point: Visual Paradigm offers rudimentary support for reversing Java code into sequence diagrams, but not for C++.
Even if such a tool existed for C++, the sad truth is that if your C++ code is complex enough that you would rather use a tool to make a sequence diagram for it instead of doing it manually, then it is most likely too complex for the tool to give you anything useful and you would have to fix it up yourself anyways.
You can try CppDepend which provides the Dependency graph and the dependency matrix to explore the dependencies between directories, files and functions.
Have you tried with plantuml? It works really well with Doxygen, I use it at work with the company template and the syntax it's really easy, you have to write the call sequence yourself though. There are plenty examples in the page, if you are working in Linux you can use your native packaging tool to install it, the same applies to Doxygen (e.g. sudo apt-get plantuml). Otherwise if you are using Windows you can use the installers from the official pages too.
You'll have to do some configuration but it's pretty straightforward, I'll leave you the links to each tool.
Download pages:
Plantuml examples:
You can find the documentation in each page, for plantmul you use java executable (.jar) then you don't have to install nothing, you just need to configure doxygen to find the executable, you can find how in the doxygen documentation page:
If you want to configure it without reading the documentation you could also watch this video:
I hope this helps, cheers.
You could explore trace2uml with works with doxygen.

What's the easiest way to Create a User-Friendly front end for a C program on Linux platform

I have a small course project that would best have a user-friendly front end.
It's a network sniffer, I coded the program with C and Linux. And now I am hoping to make it more ``user-friendly".
In c: Getopt
In c++, if relevant: Boost program options
Try to behave like other programs (at the very least provide a useful --help message, and print some sort of simple usage description for invalid arguments). I find the easiest way to understand how to use a program is when its manual page, or even --help message gives examples of common usage cases.
If by user friendly you mean you want to make a gui for it then I would definitely recommend GTK. GTK is one of the more widely use Xserver tool kits and it is written in C. Another plus is that it is written in an object oriented manner. IMO being exposed to how OO programming is accomplished in C is a great thing for all CS students.
If your sniffer has a command line front end, have a look at Eric S. Raymond's The of Unix Programming. In chapter 10, there's a whole section on how to name and format your command line arguments. There's also a POSIX standard for utility syntax.
These approaches won't directly make your program user friendly, only research on your users and analysis of your interfaces will help with this. However, providing an interface that works in ways that users expect will certainly help.
Im no expert in UI Design, or anything in that matter, but taking an interest in the quality of User Interface Design, I came across Aza Raskin, an interface design expert that is head of design for Mozilla Labs. I have followed some lectures and conventions that Aza has done on UI Design, and he said something that is simple, yet makes more sense then anything I have ever learned with UI Design...I may butcher it but its along the lines of
If The User has to think about the design,then it is a bad design
This may seem like an insult to everybody's intelligence, but it makes sense. Something that is user friendly cant be ambiguous to the user. This means that when a user is performing some task/operation, the UI should be presented to them corresponding to the current event or situation.
The UI should be designed so that anybody who picks up your software should be able to navigate through it. This DOESNT mean that they should understand the underlying problem domain, but it does mean that if asked to find a certain functional part of the software, that they could generally navigate themselves there.
Some things to things to think about when using your software:
1) -Do you ever ask yourself, "Do I go here or here?
2) -Do I use tools like bold fonts and italicizing to show emphasis?
3) -Am I sacrificing anything by making certain features "idiot proof"(Read Below)
4) -Am I trying to do too much anywhere just to save time(programming time)
These are just some things that can help straighten out some of your design decisions. In no way is this following any pattern. Like I said, my education in this field is minimal, it is just an interest I have followed.
Regarding #3, It is important that you don't sacrifice any feature or design decision when implementing certain accommodations. If you have something where 99% of your users are using a certain feature, but 1% can be expected to make a different decision, then take this into consideration. Don't sacrifice the design for the 99% of the users to accommodate the other 1%. This doesn't mean don't accommodate the other users, I just mean don't sacrifice the integrity of the design.
If you don't need to interact with the app "live" or only need limited interaction as a command line app then you can write a frontend using PyGTK. If you need to access C libraries then you can use Cython to load and call them.
But regardless of what you choose, be sure to find a professional interface designer. A bad interface can destroy the potential popularity of any app.
