converting GSL grammar to GRXML format - vxml

Are there any tools which would convert existing GSL grammar files to GRXML grammar files ?
I expected such a tool to be alread there but couldn't find it on searching, am I missing something ?

The NuEcho NuGram grammar development environment supports this:
The NuGram platform natively support
the W3C's SRGS ABNF format with
extensions to support the development
of dynamic grammars. The ABNF format
was chosen since it is a W3C standard
and is without a doubt much more
readable and maintainable than the XML
format.
Grammars can be translated to and from
other formats as well. The currently
supported languages are: GrXML (W3C
SRGS XML), and Nuance GSL.
The NuGram platform also supports the
most widely used semantic tag
languages: W3C SISR (including the
SISR 2004 Working Draft syntax),
Nuance's GSL semantic tags, and
Nuance's OSR. tags.

I take it you're currently upgrading from Nuance 8.5 or a previous version? If you are upgrading to the Nuance Recognizer 9-10 engines, there is a tool provided for converting between versions. It isn't a perfect conversion and the converted grammars normally need some modification/refinement, but it is a good starting point.

Related

Is there an external parser generator tool used for building Alloy language parser

Have Alloy developers used any parser generator tool (like ANTLR) for parsing alloy specifications, or is its parser built-in and specifically written for the alloy language purpose?
If they used external tool for Alloy parser implementation, how can I access further information regarding this (for example the grammar which is fed into the external parser generator).
Alloy uses a modified version of CUP (which is shipped with the Alloy distribution). You can find the grammar specification files (Alloy.lex and Alloy.cup) inside the edu.mit.csail.sdg.alloy4compiler.parser package. In the same package there are some bash scripts used to generate corresponding lexer/parser classes.
http://alloy.mit.edu/alloy/documentation/book-chapters/alloy-language-reference.pdf
Section B.3 has the grammar.
Can't say anything about the language implementation.

GCC Xml Alternatives

I am looking into GCCXML, which can parse a given header file and generates XML format of C code meta data. But GCCxml is an open source. Is there any commercial version of c code parser which works similar to GCC XML?
Thanks,
Karthick
The obvious replacement for gccxml will be clang, which is licensed under BSD license (so you can freely use it in commercial projects, do whatever you want with the code, etc.). clang used to have an xml AST dumper built-in, but it was removed at some stage. If you only need to extract specific information (such as function prototypes for IDL generation or stuff like this) it is not difficult to write a basic custom clang plugin to do this. Otherwise, you can search around for existing clang plugins which will do the job, such as this one:
https://github.com/sk-havok/clang-extract
Clang plugin tutorial: http://clang.llvm.org/docs/ClangPlugins.html
See our DMS Software Reengineering Toolkit with its C Front End for an equivalent/superset of GCCXML.
The C front end can handle a variety of C dialects (ANSI, GCC, MS). It contains a full preprocessor. It can export ASTs for the complete language (esp. including function bodies, which GCCXML does not do, IIRC) and its symbol table, both in XML format.
Here at SO there is an example dump of the AST from DMS's C++ front end. This uses the same machinery as the C front end uses.

Why isn't regular expressions part of ISO C99

Everyone knows how awesome C language is and how much it sucks in text processing tasks. Given these facts. Regex definitely must be part of ISO C. But it isn't. I don't understand why? Are there people who think its not essential?
Regular Expressions don't belong in the C language proper any more than a sound library, a graphics library, or an encryption library does. Doing so would reduce the general purpose nature of the language and greatly inhibit its use as a small and efficient embedded language.
The philosophy of C was to have a very small and efficient language keyword set with standardized libraries for the next layer of functionality. Since things like regex, graphics, sound, encryption, etc. don't have a single platform or standard they don't fit in with the standard C library.
They fit best as user libraries which they currently are.
Regex is defined as part of IEEE Std 1003.1:2001 (POSIX)
Here's a handly list of which headers are in which standard:
http://www.schweikhardt.net/identifiers.html
Because it is a library feature that would require standardizing on one of the regex languages. Standard bodies are commitee driven, not an easy task.
This document explains the rationalization of the standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf which might clarify why.
Another reason explained in the doc. is to keep the language simple.
There are quite a few downloads available, just use one.
Because regexes are not essential to a programming language. Handy? Yes, very much so, when you need them. Essential? No way.
Web developers will naturally consider regexes to be an essential feature of a language because they have to validate all that HTML form data. Developers whose experience is always with one of a few big-name relational database servers will consider SQL support to be essential. Those working in the scientific domain will require support for "big numbers" or tensors. GUI developers think a built-in GUI toolkit is essential. Some folks deal with XML all day and consider XML support to be essential.... etc. you get the idea. This list of "essentials" can get pretty big, and languages like Java have certainly taken the "kitchen sink" approach to their massive standard libraries. I appreciate that C is not a kitchen sink language in that sense.
Be careful not to assume that your favorite language feature is an essential feature for everyone else.
The point of C is to be small yet powerful. Since regular expressions are typically a large and complex topic, it belongs in a library. It is too bad though that the C committee doesn't "sponser" some well written, standard C, algorithms/data structure libraries. There is a plethora of them out there. I tend to stick with GNU "sponsored" libs whenever I can since they are available for most platforms even if they aren't necessarily the easiest or most efficient to use. They do strike a nice balance.

Is there a bundled library for regular expressions in MSVC?

If I'm compiling a C program with gcc, I can safely assume that the functions in regex.h are available. Is there a regex library I can assume is there if someone is compiling with microsoft's C compiler?
C++ only, but may be something you can use (or wrap):
Visual C++ 2010 includes the TR1 regex library support.
http://msdn.microsoft.com/en-us/library/bb982382.aspx
It's also available for VC++ 2008 in a feature pack:
http://www.microsoft.com/downloads/details.aspx?FamilyId=D466226B-8DAB-445F-A7B4-448B326C48E7&displaylang=en
No, I don't think MSVC comes bundled with any regex library.
Regex isn't part of the C/C++ standard library, so you shouldn't rely on any compiler providing such a library by default. It's best to get hold of a separate regex library for C (I'm sure there are tons available) and include it with your code.
Try Boost or wait for release of C++1x...
There's no C/C++ regexp library bundled with msvc. C++/CLI have access to the .NET regexp classes though.
Perhaps you can use PCRE
If you want POSIX-compatible regular expression semantics (and the same API too!) then the best regex library is TRE: http://laurikari.net/tre/
Unlike most regex implementations, it follows POSIX exactly in regards to the matches it returns for parenthesized subexpressions, and it's O(n) whereas most implementations are O(2^n) in time.
Google also has a new regex implementation that uses Perl-compatible syntax if you prefer that. You can find a link on the TRE website.
Edit: By the way, TRE seems to come with project files to build it under MSVC.

Where can I get started with Unicode-friendly programming in C?

So, I’m working on a plain-C (ANSI 9899:1999) project, and am trying to figure out where to get started re: Unicode, UTF-8, and all that jazz.
Specifically, it’s a language interpreter project, and I have two primary places where I’ll need to handle Unicode: reading in source files (the language ostensibly supports Unicode identifiers and such), and in ‘string’ objects.
I’m familiar with all the obvious basics about Unicode, UTF-7/8/16/32 & UCS-2/4, so on and so forth… I’m mostly looking for useful, C-specific (that is, please no C++ or C#, which is all that’s been documented here on SO previously) resources as to my ‘next steps’ to implement Unicode-friendly stuff… in C.
Any links, manpages, Wikipedia articles, example code, is all extremely welcome. I’ll also try to maintain a list of such resources here in the original question, for anybody who happens across it later.
A must read before considering anything else, if you’re unfamiliar with Unicode, and what an encoding actually is: http://www.joelonsoftware.com/articles/Unicode.html
The UTF-8 home-page: http://www.utf-8.com/
man 3 iconv (as well as iconv_open and iconvctl)
International Components for Unicode (via Geoff Reedy)
libbasekit, which seems to include light Unicode-handling tools
Glib has some Unicode functions
A basic UTF-8 detector function, by Christoph
International Components for Unicode provides a portable C library for handling unicode. Here's their elevator pitch for ICU4C:
The C and C++ languages and many operating system environments do not provide full support for Unicode and standards-compliant text handling services. Even though some platforms do provide good Unicode text handling services, portable application code can not make use of them. The ICU4C libraries fills in this gap. ICU4C provides an open, flexible, portable foundation for applications to use for their software globalization requirements. ICU4C closely tracks industry standards, including Unicode and CLDR (Common Locale Data Repository).
GLib has some Unicode functions and is a pretty lightweight library. It's not near the same level of functionality that ICU provides, but it might be good enough for some applications. The other features of GLib are good to have for portable C programs too.
GTK+ is built on top of GLib. GLib provides the fundamental algorithmic language constructs commonly duplicated in applications. This library has features such as (this list is not a comprehensive list):
Object and type system
Main loop
Dynamic loading of modules (i.e. plug-ins)
Thread support
Timer support
Memory allocator
Threaded Queues (synchronous and asynchronous)
Lists (singly linked, doubly linked, double ended)
Hash tables
Arrays
Trees (N-ary and binary balanced)
String utilities and charset handling
Lexical scanner and XML parser
Base64 (encoding & decoding)
I think one of the interesting questions is - what should your canonical internal format for strings be? The 2 obvious choices (to me at least) are
a) utf8 in vanilla c-strings
b) utf16 in unsigned short arrays
In previous projects I have always chosen utf-8. Why ; because its the path of least resistance in the C world. Everything you are interfacing with (stdio, string.h etc) will work fine.
Next comes - what file format. The problem here is that its visible to your users (unless you provide the only editor for your language). Here I guess you have to take what they give you and try to guess by peeking (byte order marks help)

Resources