I'm using watson concept insight api. I am successful at receiving the concepts however I'd also like to get their vector representation so I could measure the distance between different concepts. How can I achieve this?
The API has a method to measure the distance between concepts but it doesn't provide the vector representation of a concept. See the
API specification.
Related
Can anyone tell me what algorithm is used to classify intents and understand entities in Watson assistant? Have they published any papers or articles regarding this?
Yes, they published this paper explaining in a manner how the Watson Work, and for more information you should learn about Cognitive Systems, but in advance it's not just one algorithm used, but many approaches that combined are capable of getting the desired result.
Another material you should learn if this is your interest is the computer science area "Information Retrieval", in which many subjects are used to comprehend what the user wants and give the needed information. The book Modern Information Retrieval is a good start point.
According to IBM Developer Answers:
"Intents are classified using an SVM, with some pre training by IBM. entities use a fuzzy matching algorithm."
https://developer.ibm.com/answers/questions/387916/watson-conversation-algorithm/
Support Vector Machine (SVM) is a supervised machine learning algorithm.
I am currently exploring three services for identifying person's tweets or facebook post's are helpfulness or not:
Personality Insights
Natural Language Understanding
Discovery
will I need to write my on wrapper on these services to identify the helpfulness characteristic or is there any other way to just query & get result.
can anyone please guide which service I need to use for this task
Thanks
According to Neil, sure, all depends on how you define helpfulness.
Discovery:
If you want use Discovery you need some base to get the data, you can filter the data about you want with filter. By using data analysis combined with cognitive intuition to take your unstructured data and enrich it so you can discover the information you need.
Personality:
If you want use Personality, understand personality characteristics, needs, and values in written text. The service uses linguistic analytics to infer individuals' intrinsic personality characteristics, including Big Five, Needs, and Values, from digital communications such as email, text messages, tweets, and forum posts.
Watson Knowledge Studio:
If you want to work with models for tweets, you can use WKS (Watson knowledge Studio), this service provides easy-to-use tools for annotating unstructured domain literature and uses those annotations to create a custom machine-learning model that understands the language of the domain. The accuracy of the model improves through iterative testing, ultimately resulting in an algorithm that can learn from the patterns that it sees and recognize those patterns in large collections of new documents. For example, if you want learn about car, you can simple give some models to WKS.
It all depends on how you define helpfulness. Whether it is in general, or helpful to answering a question etc.
For Personality Insights, have a look at https://www.ibm.com/watson/developercloud/doc/personality-insights/models.html which has all the traits, as well as what they mean. The closest trait to helpfulness is probably Conscientiousness.
Neil
We are trying to implement a natural language search function using the IBM Watson Cognitive Insights (CI) service. We want the user to be able to type in a question using natural language and then return the appropriate document(s) from a CI corpus. We are using CI rather than the Watson QA service to avoid the need for training and to keep Watson infrastructure costs down (i.e. avoid the need for a dedicated instance of Watson for each corpus/use case).
We are able to build the necessary corpus through the CI API but we are not sure which APIs to use in what order to accomplish the most precise/accurate query possible.
Our initial thought was to:
Accept the user’s natural language question and Post that text string to the “Identifies concepts in a piece of text” API (listed 6th from the bottom in the CI API Reference document) to get a list of concepts related to the question.
Then do a GET using the “Performs a conceptual search within a corpus” API (listed 3rd from the bottom in the CI API Reference document) to get a list of related documents back from the corpus.
The first question - is this the right way to go about achieving our objective described in the first paragraph of this post? Should we be combining the CI APIs differently or using multiple Watson services together to achieve the objective?
If our initial approach is the right one, then we are finding that when we submit a simple question (e.g. “How can I repair MySQL database corruption”) to the “Identifies concepts in a piece of text” API we are not getting a comprehensive list of associated concepts back. For example:
curl -u userid:password -k -d "How can I repair MySQL database corruption" https://gateway.watsonplatform.net/concept-insights-beta/api/v1/graph/wikipedia/en-20120601?func=annotateText
returns:
[{"concept":"/graph/wikipedia/en-20120601/MySQL","coords":[[17,22]],"weight":0.85504603}]
Yet clearly there are other concepts associated with the example question (repair, corruption, database, etc.).
In another example we just submitted the text “repair” to the “Identifies concepts in a piece of text” API:
curl -u userid:password -k -d "repair" https://gateway.watsonplatform.net/concept-insights-beta/api/v1/graph/wikipedia/en-20120601?func=annotateText
and it returned:
[{"concept":"/graph/wikipedia/en-20120601/Repair","coords":[[0,6]],"weight":0.65392953}]
It seems that we should have gotten back the “Repair” concept from the first example also. Why would the API return the “repair” concept when we submit "repair" but not when we submit the text “How can I repair MySQL database corruption” which also includes the word “repair.”
Please advise as to the best way to implement a natural language search function based on the Watson Concept Insights service (perhaps in combination with other services if appropriate).
Thank you very much for your question and my apologies for being so late in answering it.
The first question - is this the right way to go about achieving our objective >described in the first paragraph of this post? Should we be combining the CI >APIs differently or using multiple Watson services together to achieve the objective?
Doing the steps above would be a natural way to accomplish what you want to do. Please note however that the "annotate text" API uses currently exactly the same technology that the system has for connecting documents in your corpus to concepts in the core knowledge graph and as such, it is more "paragraph" oriented rather than individual question oriented. To be more precise, the problem of extracting concepts in a smaller piece of text is generally more difficult than in a larger piece of text because in the latter there is more context that can be used to make the right choices. Given this observation, the annotate text API goes the more conservative route again given its paragraph focus.
Having said that, the /v2 API that we now have does improve the speed and quality of the concept extraction technology, so it is possible that you would be more successful in using it in order to extract topics from natural language questions. Here's what I would do/watch out for:
1) Clearly display to the user what CI extracted from the natural language in the input. Our APIs give you a way to retrieve a little abstract per concept which can be used to explain to a user what a concept means - do use that.
2) Give the user the ability to eliminate a concept from the extracted concept list (strike it out)
3) Since the concepts in concept insights currently correspond roughly to the notion of "topics", there is no way to deduce more abstract intent (for example, if the key to the meaning of a question is on a verb or an adjective as opposed to a noun, concept insights would be a poor way to deduce it). Watson does have technology oriented towards question answering as you pointed out before (the natural language classifier being one component of that), so I would take a look at that.
Yet clearly there are other concepts associated with the example question >(repair, corruption, database, etc.).
The answer for this and the rest of the posted question is in a sense above - our intention was to provide a technology first for "larger text" which as I explained is an easier task. Since this question was first posted and today, we did introduce new annotation technology (/v2) so I would encourage the reader to see whether it performs a little better.
For the longer term, we do have the intention to give the user a formal way to specify context for a general application so that the chances of extraction of relevant concepts increase. We also have a plan to have the user be able to specify custom concepts, as it has been observed in the past that some topics of interest to users are impossible to match in our current design because they are not in wikipedia.
I am trying to understand the structure of PostGIS, but the more I read the more confused I get. For starters, I am a complete newbie in geospatial databases. I would like to understand what is the architecture of such database, what are the most commonly used designs for such databases, the design implications. The real basic stuff, as I have literally no clue about such things. Can some one point me to some direction or such material?
Since I was looking into PostGIS I also came across libraries called GEOS(Geometry engine) and Proj.4. I know what PostGIS use them for but I am not sure if I understand it correctly. As far as I know GEOS provides the geometric data types and way of indexing them as well as to query the data. The calculations on geometric data types are based on planes and are usually fast. They also have geography data type, the calculations on these are based on spheroid and are way slower than the geometric counterpart. Now comes the various projections. I don't understand what they are for? I believe the projection convert the globe into a 2D plane so that geometric calculations can be used on them instead of geographic calculations. as geographic calculations are more expensive, thus projection is a probably a necessary evil. If so, then are the calculations accurate? I don't know if this is a correct understanding of the concepts. It would really help if someone could direct me towards any valuable material to understand geospatial database design and PostGIS and the concerned libraries.
Thank You!
ps - I am not looking on how to use PostGIS, I want to know more about how it is implemented, what was the thought process behind implementation of certain features, using these libraries etc etc. Something along these lines. Using PostGIS seems simple enough, I am not interested in that. :)
I'm pulling timeseries data in from a MS-SQL database using REST. I've found that the floating point precision goes down from a value like 0.00166667 to 0.002 when I'm using REST to retrieve data, but using the DB designer's own tools, the precision is maintained.
Is this a limitation of the REST method, or is it something specific to the implementation?
Just to clarify -- my work is using a proprietary database that uses MS-SQL as its backbone. It's not open-source so I can't poke around and see how requests are being handled.
A SOAP method is offered, which I'm going to try to implement to compare, but I'm more concerned whether or not this is a REST problem or not.
Representational State Transfer is just a general style of client-server architecture. It doesn't specify anything nearly so detailed such the appropriate handling of floating point values. The only constraints it imposes are things like the communication should be "stateless". So the concepts of REST exist on a higher level of abstraction and the issue you are seeing must be something specific to the implementation of the service that is providing the floating point values.