Watson assistant algorithm used - ibm-watson

Can anyone tell me what algorithm is used to classify intents and understand entities in Watson assistant? Have they published any papers or articles regarding this?

Yes, they published this paper explaining in a manner how the Watson Work, and for more information you should learn about Cognitive Systems, but in advance it's not just one algorithm used, but many approaches that combined are capable of getting the desired result.
Another material you should learn if this is your interest is the computer science area "Information Retrieval", in which many subjects are used to comprehend what the user wants and give the needed information. The book Modern Information Retrieval is a good start point.

According to IBM Developer Answers:
"Intents are classified using an SVM, with some pre training by IBM. entities use a fuzzy matching algorithm."
https://developer.ibm.com/answers/questions/387916/watson-conversation-algorithm/
Support Vector Machine (SVM) is a supervised machine learning algorithm.

Related

Find topics in Machine learning in gcloud

I'm new in machine learning I see some services on Google Cloud platform related to A.I I think these are easy to use.
Here is what I need I have around 20K paragraphs (3 or 4 line) I need to find the most matching paragraph according to user question. User ask any question or type any sentence I need to find the most similar paragraph related to this user sentence how can I do that. What Services I need to achieve this I want to use Google Cloud platform. is it possible in gcloud if yes then how.
I think you can start your journey looking through the GCP AI & ML product list. Narrowing down the initial request and seeking for a best match affording your custom use scenario, I would advice to get more details about GCP AutoML products, offering a variety of complete solution for a generic machine learning models such as AutoML Natural Language model specifically designed for a document and text analysis tasks.
I would encourage you to start with AutoML Natural Language beginner's guide to get more context, having look at features and capabilities like of Classification, Entity extraction, Sentiment analysis training approaches.
As from the developers perspective, Cloud AutoML Natural Language supports a client libraries for most known programming languages and offers good REST API documentation though.

Silverlight demo for bioinformatics

I am a beginner Silverlight programmer preparing for the interview in medical research company. Job sounds damn interesting and I would like to get there.
To show my skills and interest, I want to write a program related to the topic.
What would you suggest?
First ideas: simple statistical analysis of input data, image collections (for example, find HD DNA image and put it in Silverlight Deep Zoom), lab inventory program..
Have a look at http://research.microsoft.com/en-us/projects/bio/default.aspx the Microsoft Biology Foundation, part of Microsoft Research. Its code is OpenSource (sic) and you will find many applications there. The apps cover most of the basics, sequences, etc. and have some nice display tools.
Collection/Maintenance/Retrieval of data is very important for any organization. Try this tutorial:
Silverlight tutorial: Creating a data centric web app with DataGrid, LINQ, and WCF Web Service
You placed a "bioinformatics" tag on your question. Many bioinformatics companies consider Perl programming to be quite important.
I suggest that you perform a search on "bioinformatics Perl" and take a look at books and sites that are retrieved. Perhaps you could park yourself at a local bookstore and peruse some of those titles. Free Perl interpreters are available.
You do have a basic understanding of genetics, yes? Be very familiar with some of the terminology, so you won't have to stare off into space while you pick from genotype/phenotype or mRNA/RNA/DNA or recall what a codon is.
It wouldn't hurt to nose around PubMed and get a basic understanding of what genomes are out there and what statistical tests can be performed on them.
I like your statistics idea. Perhaps you could write a program that tells you whether to accept or reject a null hypothesis based on numbers you read in from a file. Or perhaps you could figure out how to use the statistics portion of Entrez Gene in PubMed.
Best wishes for the interview.

looking for a good project to work on as my graduation project in the university that involves Ai / Machine Learning, please help me

I need help to chose a project to work on for my master graduation, The project must involve Ai / Machine learning or Business intelegence.. but if there is any other suggestion out of these topics it is Ok, please help me.
One of the most rapid growing areas in AI today is Computer Vision. There are many practical needs where the results of your Master Thesis can be helpful. You can try research something like Emotion Detection, Eye-Tracking, etc.
An appropriate work for a MS in CS in any good University can highlight the current status of research on this field, compare different approaches and algorithms. As a practical part, it makes also a lot of fun when your program recognizes your mood properly :)
Netflix
If you want to work more on non trivial datasets (not google size, but not trivial either and with real application), with an objective measure of success, why not working on the netflix challenge (the first one) ? You can get all the data for free, you have many papers on it, as well as pretty good way to compare your results vs other peoples (since everyone used exactly the same dataset, and it was not so easy to "cheat", contrary to what happens quite often in the academic literature). While not trivial in size, you can work on it with only one computer (assuming it is recent enough), and depending on the type of algorithms you are using, you can implement them in a language which is not C/C++, at least for prototyping (for example, I could get decent results doing things entirely in python).
Bonus point, it passes the "family" test: easy to tell your parents what you are working on, which is always a pain in my experience :)
Music-related tasks
A bit more original: something that is both cool, not trivial but not too complicated in data handling is anything around music, like music genre recognition (classical / electronic / jazz / etc...). You would need to know about signal processing as well, though - I would not advise it if you cannot get easy access to professors who know about the topic.
I can use the same answer I used on a previous, similar question:
Russ Greiner has a great list of project topics for his machine learning course, so that's a great place to start.
Both GAs and ANNs are learners/classifiers. So I ask you the question, what is an interesting "thing" to learn? Maybe it's:
Detecting cancer
Predicting the outcome between two sports teams
Filtering spam
Detecting faces
Reading text (OCR)
Playing a game
The sky is the limit, really!
Since it has a business tie in - given some input set determine probable business fraud from the input (something the SEC seems challenged in doing). We now have several examples (Madoff and others). Or a system to estimate investment risk (there are lots of such systems apparently but were any accurate in the case of Lehman for example).
A starting point might be the Chen book Genetic Algorithms and Genetic Programming in Computational Finance.
Here's an AAAI writeup of an award to the National Association of Securities Dealers for a system thatmonitors NASDAQ insider trading.
Many great answers posted already, but I wanted to add my 2 cents.There is one hot topic in which big companies all around are investing lots of resources into, and is still a very challenging topic with lots of potential: Automated detection of fake news.
This is even more relevant nowadays where most of us are connecting though social media and there's a huge crisis looming over.
Fake news, content removal, source reliability... The problem is huge and very exciting. It is as I said challenging as it can be seen from many perspectives (from analising images to detect fakes using adversarial netwotks to detecting fake written news based on text content (NLP) or using graph theory to find sources) and the possbilities for a research proyect are endless.
I suggest you read some general articles (e.g this or this) or have a look at research articles from the last couple of years (a quick google seach will throw you a lot of related stuff).
I wish I had the opportunity of starting over a project based on this topic. I think it's going to be of the upmost relevance in the next few years.

What programming language is used to IMPLEMENT google algorithm?

It is known that google has best searching & indexing algorithm.
The also have good relevancy.
They are also quicker in getting down the latest results.
All that's fine.
What programming language (c, c++, java, etc...) & database (oracle, MySQL, etc...) have they used in achieving this (since they have to manipulate with volume of data quickly and effectively)?.
Though I'm not looking for their in-depth architecture (if in case violates their company policies) an overview of all such things could be useful.
Anybody please add you valuable suggestions and insight on this?
Google internally use C++, Java and Python. See Rhino on Rails:
One of the (hundreds of) cool things
about working for Google is that they
let teams experiment, as long as it's
done within certain broad and
well-defined boundaries. One of the
fences in this big playground is your
choice of programming language. You
have to play inside the fence defined
by C++, Java, Python, and JavaScript.
Google's search algorithm is essentially MapReduce, which stems from functional programming techniques, implemented in C++.
Google has its own storage mechanism for this called the Google File System.
Mainly pigeons:
PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.
Relevance of search results is governed by quality of information retrieval algorithms they use, not the programming language.
But C++ is what most of their backend code is written in (for most services).
They don't use any off-the-shelf RDBMS products for data storage. All of that is written in-house.
Check it out, the Bigtable.

How to design and verify distributed systems?

I've been working on a project, which is a combination of an application server and an object database, and is currently running on a single machine only. Some time ago I read a paper which describes a distributed relational database, and got some ideas on how to apply the ideas in that paper to my project, so that I could make a high-availability version of it running on a cluster using a shared-nothing architecture.
My problem is, that I don't have experience on designing distributed systems and their protocols - I did not take the advanced CS courses about distributed systems at university. So I'm worried about being able to design a protocol, which does not cause deadlock, starvation, split brain and other problems.
Question: Where can I find good material about designing distributed systems? What methods there are for verifying that a distributed protocol works right? Recommendations of books, academic articles and others are welcome.
I learned a lot by looking at what is published about really huge web-based plattforms, and especially how their systems evolved over time to meet their growth.
Here a some examples I found enlightening:
eBay Architecture: Nice history of their architecture and the issues they had. Obviously they can't use a lot of caching for the auctions and bids, so their story is different in that point from many others. As of 2006, they deployed 100,000 new lines of code every two weeks - and are able to roll back an ongoing deployment if issues arise.
Paper on Google File System: Nice analysis of what they needed, how they implemented it and how it performs in production use. After reading this, I found it less scary to build parts of the infrastructure myself to meet exactly my needs, if necessary, and that such a solution can and probably should be quite simple and straight-forward. There is also a lot of interesting stuff on the net (including YouTube videos) on BigTable and MapReduce, other important parts of Google's architecture.
Inside MySpace: One of the few really huge sites build on the Microsoft stack. You can learn a lot of what not to do with your data layer.
A great start for finding much more resources on this topic is the Real Life Architectures section on the "High Scalability" web site. For example they a good summary on Amazons architecture.
Learning distributed computing isn't easy. Its really a very vast field covering areas on communication, security, reliability, concurrency etc., each of which would take years to master. Understanding will eventually come through a lot of reading and practical experience. You seem to have a challenging project to start with, so heres your chance :)
The two most popular books on distributed computing are, I believe:
1) Distributed Systems: Concepts and Design - George Coulouris et al.
2) Distributed Systems: Principles and Paradigms - A. S. Tanenbaum and M. Van Steen
Both these books give a very good introduction to current approaches (including communication protocols) that are being used to build successful distributed systems. I've personally used the latter mostly and I've found it to be an excellent text. If you think the reviews on Amazon aren't very good, its because most readers compare this book to other books written by A.S. Tanenbaum (who IMO is one of the best authors in the field of Computer Science) which are quite frankly better written.
PS: I really question your need to design and verify a new protocol. If you are working with application servers and databases, what you need is probably already available.
I liked the book Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum and Maarten van Steen.
At a more abstract and formal level, Communicating and Mobile Systems: The Pi-Calculus by Robin Milner gives a calculus for verifying systems. There are variants of pi-calculus for verifying protocols, such as SPI-calculus (the wikipedia page for which has disappeared since I last looked), and implementations, some of which are also verification tools.
Where can I find good material about designing distributed systems?
I have never been able to finish the famous book from Nancy Lynch. However, I find that the book from Sukumar Ghosh Distributed Systems: An Algorithmic Approach is much easier to read, and it points to the original papers if needed.
It is nevertheless true that I didn't read the books from Gerard Tel and Nicola Santoro. Perhaps they are still easier to read...
What methods there are for verifying that a distributed protocol works right?
In order to survey the possibilities (and also in order to understand the question), I think that it is useful to get an overview of the possible tools from the book Software Specification Methods.
My final decision was to learn TLA+. Why? Even if the language and tools seem better, I really decided to try TLA+ because the guy behind it is Leslie Lamport. That is, not just a prominent figure on distributed systems, but also the author of Latex!
You can get the TLA+ book and several examples for free.
There are many classic papers written by Leslie Lamport :
(http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html) and Edsger Dijkstra
(http://www.cs.utexas.edu/users/EWD/)
for the database side.
A main stream is NoSQL movement,many project are appearing in the market including CouchDb( couchdb.apache.org) , MongoDB ,Cassandra. These all have the promise of scalability and managability (replication, fault tolerance, high-availability).
One good book is Birman's Reliable Distributed Systems, although it has its detractors.
If you want to formally verify your protocol you could look at some of the techniques in Lynch's Distributed Algorithms.
It is likely that whatever protocol you are trying to implement has been designed and analysed before. I'll just plug my own blog, which covers e.g. consensus algorithms.

Resources