I am trying to find out the entity from text input. If there any option to train Alchemy. So that I can modify entity according to my needs.
You can't train the entity extraction from AlchemyLanguage but there are other Watson APIs that you can use to extract entities or concepts from text.
Relationship Extraction: Performs linguistic analysis of the input text. It then finds spans of text and clusters them together to form entities, before finally extracting the relationships between them.
Concept Insights: Helps you annotate concepts and identify conceptual associations from text.
You can quickly test this API Swagger: https://watson-api-explorer.mybluemix.net/apis/concept-insights-v2
Related
We are trying to use the ML model in Vespa, we have textual data stored in Vespa, can somebody help us with the below question-
One example of onnx model trained using scikit-learn used in Vespa.
Where to add preprocessing steps before model training and prediction using onnx model in Vespa with example.
This is a very broad question and the answer very much depends on what your goals are. In general, the documentation for using an ONNX model in Vespa can be found here:
https://docs.vespa.ai/documentation/onnx.html
An example that uses an ONNX BERT model for ranking can be found in the Transformers sample application:
https://github.com/vespa-engine/sample-apps/tree/master/transformers
Note that both these links assume that you have an existing model. In general, Vespa is a serving platform and not usually used in the model training process. As such Vespa doesn't really care where your model comes from, be that scikit-learn, pytorch or any other system. ONNX is a general format for ML model exchange between various systems.
However, there are some foundational ideas that I think I should get across that maybe can clarify a bit. Vespa currently considers all ML models to have numeric (in the form of tensors) inputs and outputs. This means you can't directly put text to your model and have text come out on the other side. Most textual data these days are encoded to some form of numeric representation such as embedding vectors, or, as the BERT example above shows, text is tokenized such that each token gets its own vector representation. After model computation, embedding vectors or token-set representations can be decoded back to text.
Vespa currently handles the computational part, the (pre-)processing of encoding/decoding text to embeddings or other representations are currently up to the user. Vespa does offer a rich set of features to help out in this regard in the form of document and query processors. So you can create a document processor that encodes the text of each incoming document to some representation before storing it. Likewise, a searcher (query processor) can be created that encodes incoming textual queries to a compatible representation before documents are scored against it.
So, in general, you would train your models outside of Vespa using whatever embedding or tokenization strategies are necessary for your model. When deploying the Vespa application you add the models with any required custom processing code, which is used when feeding or querying Vespa.
If you have a more concrete example of what you are trying to achieve I could be more specific.
I have a need to extract entities from word and pdf documents. Documents can be in the range of 10 to 20 pages. Are there scalable library/APIs available that we can plug into our processing pipeline? Any comparative study of different solutions will be helpful.
Take a look at the Watson Natural Language Understanding (you'll need to get an IBM ID and then login to see this content - don't worry , cost is $0). With Watson Natural Language Understanding you will want to look at the API Explorer to find the correct API syntax to use to get the results that you are looking for.
I also noticed that mention Word/PDF documents. You will need to convert those using the Watson Discovery service, and then you can pass the converted documents to Watson Natural Language Understanding, which takes in JSON, text or HTML inputs.
I am currently exploring three services for identifying person's tweets or facebook post's are helpfulness or not:
Personality Insights
Natural Language Understanding
Discovery
will I need to write my on wrapper on these services to identify the helpfulness characteristic or is there any other way to just query & get result.
can anyone please guide which service I need to use for this task
Thanks
According to Neil, sure, all depends on how you define helpfulness.
Discovery:
If you want use Discovery you need some base to get the data, you can filter the data about you want with filter. By using data analysis combined with cognitive intuition to take your unstructured data and enrich it so you can discover the information you need.
Personality:
If you want use Personality, understand personality characteristics, needs, and values in written text. The service uses linguistic analytics to infer individuals' intrinsic personality characteristics, including Big Five, Needs, and Values, from digital communications such as email, text messages, tweets, and forum posts.
Watson Knowledge Studio:
If you want to work with models for tweets, you can use WKS (Watson knowledge Studio), this service provides easy-to-use tools for annotating unstructured domain literature and uses those annotations to create a custom machine-learning model that understands the language of the domain. The accuracy of the model improves through iterative testing, ultimately resulting in an algorithm that can learn from the patterns that it sees and recognize those patterns in large collections of new documents. For example, if you want learn about car, you can simple give some models to WKS.
It all depends on how you define helpfulness. Whether it is in general, or helpful to answering a question etc.
For Personality Insights, have a look at https://www.ibm.com/watson/developercloud/doc/personality-insights/models.html which has all the traits, as well as what they mean. The closest trait to helpfulness is probably Conscientiousness.
Neil
When will ibm make it's Watson Q&A api capable of accepting a custom corpus?
Is there a roadmap I can see?
Besides the Question and Answer service currently doesn't provide a way to use your own data. You can get similar or better results by combining Document Conversion and Retrieve and Rank.
You will use Document Conversion to convert your corpus documents (PDF, docx, html) to answer units that will be indexed by the Retrieve and Rank service.
The Retrieve and Rank service is built on top of Apache Solr, and once you load your data into the Solr index, you can create and train a Ranker (machine learning model that knows how to sort results).
To expand on German's answer, also take a look at the Watson Natural Language Classifier (NLC) and Dialog services, which are additional building blocks for creating a custom Question and Answer application. NLC classifies text and allows you to trigger an action, and Dialog allows you to create and manage virtual conversations with your users.
Here is a great blog with an introduction to both NLC and Dialog. And another good blog that introduces the Watson Document Conversion and Retrieve and Rank services.
I'm trying to understand the concept of Documents on Google App Engine's Search API. The concept I'm having trouble with is the idea behind storing documents. So for example, say in my database I have this:
class Business(ndb.Model):
name = ndb...
description = ndb...
For each business, I am storing a document so I can do full-text searches on the name and description.
My questions are:
Is this right? Does these mean we are essentially storing each entity TWICE, in two different places, just to make it searchable?
If the answer to above is yes, is there a better way to do it?
And again if the answer to number 1 is yes, where do the documents get stored? To the high-rep DS?
I just want to make sure I am thinking about this concept correctly. Storing entities in docs means I have to maintain each entity in two separate places... doesn't seem very optimal just to keep it searchable.
You have it worked out already.
Full Text Search Overview
The Search API allows your application to perform Google-like searches
over structured data. You can search across several different types of
data (plain text, HTML, atom, numbers, dates, and geographic
locations). Searches return a sorted list of matching text. You can
customize the sorting and presentation of results.
As you don't get to search "inside" the contents of the models in the datastore the search API provides the ability to do that for text and html.
So to link a searchable text document (e.g a product description) to a model in the datastore (e.g. that product's price) you have to "manually" make that link between the documents and the data-store objects they relate to. You can use the search api and the datastore totally independently of each other also so you have to build that in. AFAIK there is no automatic linkage between them.