For our CMIS server side implementation, I am looking to build a parser that will parse query statements provided as input to the query method. CMIS defines a BNF grammar for the query statements. I was wondering what would be the best way to generate this BNF parser?
Our implementation is in C#. CMIS queries are based on SQL syntax plus some predicates defined by CMIS.
Apache Chemistry OpenCMIS uses Antlr on the server side to parse, validate and interpret the cmisQL syntax.
You can try to reuse Antlr grammar defined in the Apache Chemistry OpenCMIS implementation for generating your own C# parser with antlr3
OpenCMIS grammar files are available here (under the Apache Licence)
cmisantlrcsharp
Related
we want to use the language specific analyzers provided by azure search, but add the html_char filter from Lucene. Our idea was to build a custom analyzer that uses the same components (tokenizer, filters) as for example the en.microsoft analyzer but add the additional char filter.
Sadly we can't find any documentation on what exactly constitutes the en.microsoft analyzer or any other Microsoft analyzer. We do not know which tokenizers or filters to use to get the same result with a custom analyzer.
Can anyone point us in to the right documentation?
The documentation says that the en.microsoft analyzer performs lemmatization instead of stemming but I can't find any tokenizer or filter that claims to use lemmatization only stemmers.
To create a customized version of a Microsoft analyzer, start with the Microsoft tokenizer for a given language (we have a stemming and non-stemming version), and add token filters from the set of available token filters to customize the output token stream. Note that the stemming tokenizer also does lemmatization, depending on the language.
In most cases, a Microsoft language analyzer is a Microsoft tokenizer plus a stopwords token filter and a lowercase token filter, but this varies depending on the language. In some cases we do language specific character normalization.
We recommend using the above as a starting point. The Analyze API can then be used for testing your configuration to see if it gives you the results you want.
I have found a spell checker in english:
spellchecker english
Is there also one in german?
Try JavaScript SpellCheck.
It supports a wide variety of languages including German.
Most of the application i developed, I used the key-value paired JS file for all required languages. Those translations i am received from business or i translated English to XXX language using google translator & verified those from the business. Once you have it then it is very easy to apply it into application using angular translation service.
I've implemented a fulltext search for one of our products. I'm using the CONTAINS/CONTAINSTABLE keyword for searching. But the problem is, that i often get syntax errors, because of wrong search conditions/input.
Is there a simple way to make the fulltext search enduser freandly, or do i have to build my own preparser before executing the search. For example splitting the search with a shunting yard and build a complete new search string?
It would be nice, if there is a more simple way, like SAP Sybase SQL Anywhere provides (They have a nice, robust fulltext search/index).
Thank you!
I've previously used Michael Coles article "A Google-like Full Text Search" to help me create a user friendly ASP.Net front-end to SQL Server FTS. The article goes into a good level of detail about how he uses Roman Ivantsov's Irony .NET Compiler Construction Kit to 'compile' a modified Google search syntax into a SQL Server FTS CONTAINS query. You don't have to understand it all though - there's a sample download that will get you going as long as you've got basic C# .Net skills.
I was very pleased with the result and the users were very pleased that they could do full text searches using a syntax they were already familiar with.
Hope this helps,
Rhys
I am rewriting our company's search functionality to use Solr instead of Compass. Our old code is using CompassQueryBuilder.CompassQueryStringBuilder to build a query out of a list of keywords. The keywords may have spaces in them: for example: "john smith", "tom jones".
Is there an existing facility I can use in Solr to replicate this functionality?
The closest thing I know for SolrJ is the solrj-criteria project. It seems to be currently unmaintained though.
Solr offers a wide variety of querying and indexing options. So fields that contain keywords with spaces in it, can be made possible by defining a custom type in the configuration file (see here). Queries with spaced keywords in it can be made possible by specifying a custom QueryParser. (see here)
Solr itself doesn't offer a QueryStringBuilder in an API. Actually, Solr itself doesn't offer any API classes at all, since all interaction is done by posting messages over Http. There are client libraries for Java, .NET and PHP etc. In the SolrNet api there exists a SolrMultipleCriteriaQuery, which is quite similar to the CompassQueryStringBuilder.
See question. Also any links to example code or example code on how to validate an xml file against multiple schemas would be helpful.
EDT: Sorry forgot to mention that this is for LINUX
libxml2 is portable, pure C and implements XML Schema. It is also open-source (MIT license) and has an active developer community.
If you're targeting Windows platforms, you can use MSXML, but your use case has to allow COM.
EDIT: Apache has an open-source XML library called Xerces-C++ that supports schema validation. It's not C-compatible, but if you can get away with using C++, it should do what you need.