Custom UIMA annotators in IBM Watson Retrieve&Rank - ibm-watson

Is it possible to use custom uima annotators in Retrieve&Rank service?
How can I upload my custom annotator (packaged as jar file) to the service?
I need to create an entity annotator to discover my custom domain entities.

I don't think there is an obvious straightforward way to use a custom UIMA annotator in R&R.
Possible approaches you could use, if you want to try integrating the two though:
Use a UIMA pipeline to annotate your documents before storing them in R&R, or as you query R&R for them. I've not tried this myself, but I've seen references to this sort of thing - e.g. http://wiki.apache.org/solr/SolrUIMA so there might be some value in trying this
Use the annotations from your UIMA pipeline to generate additional feature scores that the ranker you train can include in it's training. For example, if your annotator detects the presence or absence of a particular custom domain entity, it could turn this into a score that contributes to the feature scores for a search result. For an example of contributing custom feature scorers to R&R, see https://github.com/watson-developer-cloud/answer-retrieval

Related

Apache Solr: Can apache solr be used as a third part system for indexing and searching for documents from different websites?

I am working on implementing a research web application or portal that integrates different research portal or website using an open source platform called search kit. The web application will act as a central point of access to research publications on different research portals. To do this, I also need to implement a third party system that does the following:
Searches for documents based on user query on the other different research portals and presents or displays the results to the users on my web application.
Index the documents
Should be used by system administrators to configure the web application. Whereby system administrators can add,remove or modify the URL of the website Solr is pulling documents from
Displays the results to the user in one standard format.
My question is, can apache solr be used to implement the third party system? if not, what open source platform or way would you recommend I used to implement the third party system?
In general, Solr seems like a good fit here, but you might need some custom code (apart from configuration) here and there. To go through the points:
Querying is one of the main features of Solr, so this is definitely possible.
Indexing is handled by Solr.
There was a component for Solr called "Data Import Handler" that supported indexing from URLs (see the docs). However, this was removed from the main Solr distribution, and was moved to a separate package. This package doesn't seem to be actively maintained though, so you will probably run into some problems if you decide to use it. The alternative is to develop your document-pulling code yourself.
Solr can display the results in multiple formats, but it still might not support the exact format you would like it to be. In this case, you need to build your transformation based on the result from Solr.

Stemming and stopwords on IBM Watson Conversation service

For my app domain exists a lot of specific words with synonymous.
I need to configure Conversation service to understand custom words and their synonymous.
Is word stemming and stopwords dictionary available in Conversation service? Do i need a custom dictionary for word' synonymous? how i can build it?
Stop words is a built-in feature in conversation service.
Right now, the best way how to approach synonyms is to add all of them in the intent examples. E.g. #greeting intent with examples such as Hi, Hello, Wassup, Howdy, etc.
Alternatively you can define custom entity and define all the synonyms there (there is a support for this in the UI). E.g. entity #facility has a value of pool with synonyms like swimming pool, place to swim, etc.
So yes, right now is the best to have a custom list of synonyms built with intents and entities of the conversation service.

how can I download the model file watson concept insight uses?

I'm using Watson API to do some concept annotations.
I'd like to then run word2vec on the returned concepts so I can then measure the distances / similarity between concepts. For that I need to work against the same model. Where can I download the model file watson is using here?
To be more precise I'm using the default one which is wikipedia/en-20120601
You can't download the models. That part of the service is not exposed.

Does Solr have an equivalent to CompassQueryBuilder?

I am rewriting our company's search functionality to use Solr instead of Compass. Our old code is using CompassQueryBuilder.CompassQueryStringBuilder to build a query out of a list of keywords. The keywords may have spaces in them: for example: "john smith", "tom jones".
Is there an existing facility I can use in Solr to replicate this functionality?
The closest thing I know for SolrJ is the solrj-criteria project. It seems to be currently unmaintained though.
Solr offers a wide variety of querying and indexing options. So fields that contain keywords with spaces in it, can be made possible by defining a custom type in the configuration file (see here). Queries with spaced keywords in it can be made possible by specifying a custom QueryParser. (see here)
Solr itself doesn't offer a QueryStringBuilder in an API. Actually, Solr itself doesn't offer any API classes at all, since all interaction is done by posting messages over Http. There are client libraries for Java, .NET and PHP etc. In the SolrNet api there exists a SolrMultipleCriteriaQuery, which is quite similar to the CompassQueryStringBuilder.

JAXB Naming Collision Salesforce Integration

I'm attempting to integrate with Salesforce using MyEclipse. The wizard fails because of a naming collision on a complex type "DescribeLayout". I need to write a JAXB binding file to ensure that the two interfaces that are created by the xjc compiler are in different packages, but I have absolutely no idea how to do this.
I do not have the URI's to the schemas that make up the WSDL, only the URN's.
This blog post shows how to append a suffix to type names to avoid this. I'm not a JAXB expert, but presumably there is a way to configure it to use a different package instead of a suffix.
http://blog.teamlazerbeez.com/2009/05/23/salesforcecom-partner-soap-api-jax-ws-tutorial-part-1/

Resources