I'm new in machine learning I see some services on Google Cloud platform related to A.I I think these are easy to use.
Here is what I need I have around 20K paragraphs (3 or 4 line) I need to find the most matching paragraph according to user question. User ask any question or type any sentence I need to find the most similar paragraph related to this user sentence how can I do that. What Services I need to achieve this I want to use Google Cloud platform. is it possible in gcloud if yes then how.
I think you can start your journey looking through the GCP AI & ML product list. Narrowing down the initial request and seeking for a best match affording your custom use scenario, I would advice to get more details about GCP AutoML products, offering a variety of complete solution for a generic machine learning models such as AutoML Natural Language model specifically designed for a document and text analysis tasks.
I would encourage you to start with AutoML Natural Language beginner's guide to get more context, having look at features and capabilities like of Classification, Entity extraction, Sentiment analysis training approaches.
As from the developers perspective, Cloud AutoML Natural Language supports a client libraries for most known programming languages and offers good REST API documentation though.
Related
I am currently exploring three services for identifying person's tweets or facebook post's are helpfulness or not:
Personality Insights
Natural Language Understanding
Discovery
will I need to write my on wrapper on these services to identify the helpfulness characteristic or is there any other way to just query & get result.
can anyone please guide which service I need to use for this task
Thanks
According to Neil, sure, all depends on how you define helpfulness.
Discovery:
If you want use Discovery you need some base to get the data, you can filter the data about you want with filter. By using data analysis combined with cognitive intuition to take your unstructured data and enrich it so you can discover the information you need.
Personality:
If you want use Personality, understand personality characteristics, needs, and values in written text. The service uses linguistic analytics to infer individuals' intrinsic personality characteristics, including Big Five, Needs, and Values, from digital communications such as email, text messages, tweets, and forum posts.
Watson Knowledge Studio:
If you want to work with models for tweets, you can use WKS (Watson knowledge Studio), this service provides easy-to-use tools for annotating unstructured domain literature and uses those annotations to create a custom machine-learning model that understands the language of the domain. The accuracy of the model improves through iterative testing, ultimately resulting in an algorithm that can learn from the patterns that it sees and recognize those patterns in large collections of new documents. For example, if you want learn about car, you can simple give some models to WKS.
It all depends on how you define helpfulness. Whether it is in general, or helpful to answering a question etc.
For Personality Insights, have a look at https://www.ibm.com/watson/developercloud/doc/personality-insights/models.html which has all the traits, as well as what they mean. The closest trait to helpfulness is probably Conscientiousness.
Neil
We are trying to implement a natural language search function using the IBM Watson Cognitive Insights (CI) service. We want the user to be able to type in a question using natural language and then return the appropriate document(s) from a CI corpus. We are using CI rather than the Watson QA service to avoid the need for training and to keep Watson infrastructure costs down (i.e. avoid the need for a dedicated instance of Watson for each corpus/use case).
We are able to build the necessary corpus through the CI API but we are not sure which APIs to use in what order to accomplish the most precise/accurate query possible.
Our initial thought was to:
Accept the user’s natural language question and Post that text string to the “Identifies concepts in a piece of text” API (listed 6th from the bottom in the CI API Reference document) to get a list of concepts related to the question.
Then do a GET using the “Performs a conceptual search within a corpus” API (listed 3rd from the bottom in the CI API Reference document) to get a list of related documents back from the corpus.
The first question - is this the right way to go about achieving our objective described in the first paragraph of this post? Should we be combining the CI APIs differently or using multiple Watson services together to achieve the objective?
If our initial approach is the right one, then we are finding that when we submit a simple question (e.g. “How can I repair MySQL database corruption”) to the “Identifies concepts in a piece of text” API we are not getting a comprehensive list of associated concepts back. For example:
curl -u userid:password -k -d "How can I repair MySQL database corruption" https://gateway.watsonplatform.net/concept-insights-beta/api/v1/graph/wikipedia/en-20120601?func=annotateText
returns:
[{"concept":"/graph/wikipedia/en-20120601/MySQL","coords":[[17,22]],"weight":0.85504603}]
Yet clearly there are other concepts associated with the example question (repair, corruption, database, etc.).
In another example we just submitted the text “repair” to the “Identifies concepts in a piece of text” API:
curl -u userid:password -k -d "repair" https://gateway.watsonplatform.net/concept-insights-beta/api/v1/graph/wikipedia/en-20120601?func=annotateText
and it returned:
[{"concept":"/graph/wikipedia/en-20120601/Repair","coords":[[0,6]],"weight":0.65392953}]
It seems that we should have gotten back the “Repair” concept from the first example also. Why would the API return the “repair” concept when we submit "repair" but not when we submit the text “How can I repair MySQL database corruption” which also includes the word “repair.”
Please advise as to the best way to implement a natural language search function based on the Watson Concept Insights service (perhaps in combination with other services if appropriate).
Thank you very much for your question and my apologies for being so late in answering it.
The first question - is this the right way to go about achieving our objective >described in the first paragraph of this post? Should we be combining the CI >APIs differently or using multiple Watson services together to achieve the objective?
Doing the steps above would be a natural way to accomplish what you want to do. Please note however that the "annotate text" API uses currently exactly the same technology that the system has for connecting documents in your corpus to concepts in the core knowledge graph and as such, it is more "paragraph" oriented rather than individual question oriented. To be more precise, the problem of extracting concepts in a smaller piece of text is generally more difficult than in a larger piece of text because in the latter there is more context that can be used to make the right choices. Given this observation, the annotate text API goes the more conservative route again given its paragraph focus.
Having said that, the /v2 API that we now have does improve the speed and quality of the concept extraction technology, so it is possible that you would be more successful in using it in order to extract topics from natural language questions. Here's what I would do/watch out for:
1) Clearly display to the user what CI extracted from the natural language in the input. Our APIs give you a way to retrieve a little abstract per concept which can be used to explain to a user what a concept means - do use that.
2) Give the user the ability to eliminate a concept from the extracted concept list (strike it out)
3) Since the concepts in concept insights currently correspond roughly to the notion of "topics", there is no way to deduce more abstract intent (for example, if the key to the meaning of a question is on a verb or an adjective as opposed to a noun, concept insights would be a poor way to deduce it). Watson does have technology oriented towards question answering as you pointed out before (the natural language classifier being one component of that), so I would take a look at that.
Yet clearly there are other concepts associated with the example question >(repair, corruption, database, etc.).
The answer for this and the rest of the posted question is in a sense above - our intention was to provide a technology first for "larger text" which as I explained is an easier task. Since this question was first posted and today, we did introduce new annotation technology (/v2) so I would encourage the reader to see whether it performs a little better.
For the longer term, we do have the intention to give the user a formal way to specify context for a general application so that the chances of extraction of relevant concepts increase. We also have a plan to have the user be able to specify custom concepts, as it has been observed in the past that some topics of interest to users are impossible to match in our current design because they are not in wikipedia.
I want to design a Semantic Search engine for my final year Master's degree. I have been doing a fair amount of reading both casually on the web and academic papers so I am not a total noob in this field.
My aim is to build a semantic search engine, which parses out the HTML content into its equivatlent RDF triples,stores the triples in a triplestore, through which the engine will try to respond to the query fired using SPARQL. I want to do something out of the box unlike the other students . So, I decided to build a semantic search engine.
Right now, I had a running search engine using Solr which performs keyword search, what I want to do is the semantic search. I know some open source tools regarding Web 3.0 but not sure whether they will be compatible with Solr or not.
So, can you please provide me some help for building the same.
Thanks.
Regards
Although it sounds hard, but you will not be able to capture everything.
You need a lot of data. Of course, there already is a lot of data arranged in formats like owl and rdf which you may use (e.g. WordNet, Yago, GeoNames etc), but although they are of huge size, they only focus on very small portions of a possible discourse universe.
Developing a good semantic search takes a lot of resources and brain power. Projects, like for example KompParse at the German Research Center for Artificial Intelligence, which only focus on a small part of human conversation (gossip or buying furniture) have been running for several years with several employees by now and are still just "ok".
Understanding semantics has already been implemented in different search engines, take google for example, or wolfram alpha. So this topic might not even be as much "out of the box" as you think.
So I will go with user723630 and strongly advise you, to focus on a smaller topic. You will still achieve a lot, but you will not get frustrated.
When creating a AI talking bot what kind of methods of design should I use? Should it be one function, multiple modules, should it have classes?
Understanding language is complicated, so the goal you need to determine first is what aspect of language you want to understand.
An AI must be able to understand what the person says to it, then relate it to what it already knows, and then generate a legitimate response.
These three steps can all be thought of as nearly independent, so you need to address each on its own.
The brain, the world's best language processor, uses a Neural Network, but that's not likely to work well for you.
A logic-based proof solving system, where facts that follow from facts are derived would probably work best, and I know of at least one system that uses it fairly effectively.
I'd start with an existing AI program (like the famous Eliza) and run its output through a speech synthesizer.
Some source for Eliza is available here. One open source speech synthisizer is FreeTTS.
If you're using a language other than Java, there are similar candidates AI bots and text-to-speech code out there.
I've started to do some work in this space using this open source project called Talkify:
https://github.com/manthanhd/talkify
It is a bot framework intended to help orchestrate flow of information between bot providers like Microsoft (Skype), Facebook (Messenger) etc and your backend services. The framework doesn't really provide implementation for the bot providers yet but does provide hooks into its natural language recognition engine.
The built in natural language recognition library can be used to classify sentences to topics which you can then map to skill functions.
Give it a try! I'd really like people's input to see if how it can be improved.
It is known that google has best searching & indexing algorithm.
The also have good relevancy.
They are also quicker in getting down the latest results.
All that's fine.
What programming language (c, c++, java, etc...) & database (oracle, MySQL, etc...) have they used in achieving this (since they have to manipulate with volume of data quickly and effectively)?.
Though I'm not looking for their in-depth architecture (if in case violates their company policies) an overview of all such things could be useful.
Anybody please add you valuable suggestions and insight on this?
Google internally use C++, Java and Python. See Rhino on Rails:
One of the (hundreds of) cool things
about working for Google is that they
let teams experiment, as long as it's
done within certain broad and
well-defined boundaries. One of the
fences in this big playground is your
choice of programming language. You
have to play inside the fence defined
by C++, Java, Python, and JavaScript.
Google's search algorithm is essentially MapReduce, which stems from functional programming techniques, implemented in C++.
Google has its own storage mechanism for this called the Google File System.
Mainly pigeons:
PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.
Relevance of search results is governed by quality of information retrieval algorithms they use, not the programming language.
But C++ is what most of their backend code is written in (for most services).
They don't use any off-the-shelf RDBMS products for data storage. All of that is written in-house.
Check it out, the Bigtable.