logic determining dialog in Watson assistant - ibm-watson

I want to improve ibm’s Watson assistant results.
So, I want to know the algorithm to determine a dialog in Watson assistant’s conversations.
Is it a svm algorithm?
A paper is welcome.

There are a number of ML/NLP technologies under the covers of Watson Assistant. So it's not just a single algorithm. Knowing them is not going to help you improve your results.
I want to improve ibm’s Watson assistant results.
There are a number of ways.
Representative questions.
Focus on getting true representative questions from the end users. Not only in the language that they use, but if possible from the same medium you plan to use WA on (eg. Mobile device, Web, Audio).
This is the first factor that reduces accuracy. Manufacturing an intent can mean you build an intent that a customer may never ask (even if you think they do). Second you will use language/terms with similar patterns. This makes it harder for WA to train.
Total training questions
It's possible to train an intent with one question, but for best results 10-20 example questions. Where intents are close together then more examples are needed.
Testing
The current process is to create what is called a K-Fold Cross validation (sample script). If your questions are representative then the results should give you an accurate indicator of how well it is performing.
However, it is possible to overfit the training. So you should use a blind set. This is a 10-20% of all questions (Random sample). They should never be used to train WA. Then run them against the system. Both your Blind + K-Fold should fall within 5% of each other.
You can look at the results of the K-Fold to fix issues, but blind set you should not. Blinds can go stale as well. So try to create a new blind set after 2-3 training cycles.
End user testing.
No matter how well your system is trained, I can guarantee you that new things will pop up when put in front of end users. So you should plan to have users test before you put it into production.
When getting users to test, ensure they understand the general areas it has been trained on. You can do this with user stories, but try not to prime the user into asking a narrow scoped question.
Example:
"Your phone is not working and you need to get it fixed" - Good. They will ask questions you will never have seen before.
"The wifi on your phone is not working. Ask how you would fix it". - Bad. Very narrow scope and people will mention "wifi" even if they don't know what it means.

Related

Multi-Agent system application idea

I need to implement a multi-agent system for an assignment. I have been brainstorming for ideas as to what I should implement, but I did not come up with anything great yet. I do not want it to be a traffic simulation application, but I need something just as useful.
I once saw an application of multiagent systems for studying/simulating fire evacuation plans in large buildings. Imagine a large building with thousands of people; in case of fire, you want these people to follow some rules for evacuating the building. To evaluate the effectiveness of your evacuation plan rules, you may want to simulate various scenarios using a multiagent system. I think it's a useful and interesting application. If you search the Web, you will find papers and works in this area, from which you might get further inspiration.
A few come to mind:
Exploration and mapping: send a team of agents out into an environment to explore, then assimilate all of their observations into consistent maps (not an easy task!)
Elevator scheduling: how to service call requests during peak capacities considering the number and location of pending requests, car locations, and their capacities (not too far removed from traffic-light scheduling, though)
Air traffic control: consider landing priorities (i.e. fuel. number of passengers, emergency conditions,etc.), airplane position and speed, and landing conditions (ie. number of runways, etc). Then develop a set of rules so that each "agent" (i.e. airplane) assumes its place in a landing sequence. Note that this is a harder version of the flocking problem mentioned in another reply.
Not sure what you mean by "useful" but... you can always have a look at swarmbased AI (school of fish, flock of birds etc.) Each agent (boid) is very very simple in this case. Make the individual agents follow each other, stay away from a predator etc.
Its not quite multi-agent but have you considered a variation on ant colony optimisation ?
http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms

looking for a good project to work on as my graduation project in the university that involves Ai / Machine Learning, please help me

I need help to chose a project to work on for my master graduation, The project must involve Ai / Machine learning or Business intelegence.. but if there is any other suggestion out of these topics it is Ok, please help me.
One of the most rapid growing areas in AI today is Computer Vision. There are many practical needs where the results of your Master Thesis can be helpful. You can try research something like Emotion Detection, Eye-Tracking, etc.
An appropriate work for a MS in CS in any good University can highlight the current status of research on this field, compare different approaches and algorithms. As a practical part, it makes also a lot of fun when your program recognizes your mood properly :)
Netflix
If you want to work more on non trivial datasets (not google size, but not trivial either and with real application), with an objective measure of success, why not working on the netflix challenge (the first one) ? You can get all the data for free, you have many papers on it, as well as pretty good way to compare your results vs other peoples (since everyone used exactly the same dataset, and it was not so easy to "cheat", contrary to what happens quite often in the academic literature). While not trivial in size, you can work on it with only one computer (assuming it is recent enough), and depending on the type of algorithms you are using, you can implement them in a language which is not C/C++, at least for prototyping (for example, I could get decent results doing things entirely in python).
Bonus point, it passes the "family" test: easy to tell your parents what you are working on, which is always a pain in my experience :)
Music-related tasks
A bit more original: something that is both cool, not trivial but not too complicated in data handling is anything around music, like music genre recognition (classical / electronic / jazz / etc...). You would need to know about signal processing as well, though - I would not advise it if you cannot get easy access to professors who know about the topic.
I can use the same answer I used on a previous, similar question:
Russ Greiner has a great list of project topics for his machine learning course, so that's a great place to start.
Both GAs and ANNs are learners/classifiers. So I ask you the question, what is an interesting "thing" to learn? Maybe it's:
Detecting cancer
Predicting the outcome between two sports teams
Filtering spam
Detecting faces
Reading text (OCR)
Playing a game
The sky is the limit, really!
Since it has a business tie in - given some input set determine probable business fraud from the input (something the SEC seems challenged in doing). We now have several examples (Madoff and others). Or a system to estimate investment risk (there are lots of such systems apparently but were any accurate in the case of Lehman for example).
A starting point might be the Chen book Genetic Algorithms and Genetic Programming in Computational Finance.
Here's an AAAI writeup of an award to the National Association of Securities Dealers for a system thatmonitors NASDAQ insider trading.
Many great answers posted already, but I wanted to add my 2 cents.There is one hot topic in which big companies all around are investing lots of resources into, and is still a very challenging topic with lots of potential: Automated detection of fake news.
This is even more relevant nowadays where most of us are connecting though social media and there's a huge crisis looming over.
Fake news, content removal, source reliability... The problem is huge and very exciting. It is as I said challenging as it can be seen from many perspectives (from analising images to detect fakes using adversarial netwotks to detecting fake written news based on text content (NLP) or using graph theory to find sources) and the possbilities for a research proyect are endless.
I suggest you read some general articles (e.g this or this) or have a look at research articles from the last couple of years (a quick google seach will throw you a lot of related stuff).
I wish I had the opportunity of starting over a project based on this topic. I think it's going to be of the upmost relevance in the next few years.

Testers knowledge on database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a tester, how much should one expect a tester to know about databases?
Is it only writing queries in sql or do we need to know about stored procedures, triggers etc.
Might want to tag this with a testing tag.
Depending what your product is, a tester may or may not need to know specific things. From my perspective, the professional tester is the last line of defense before a product gets into the wild. Thus, testers should at the root of it be using the product like users (crazy, manic users who are only interested in pain and misery, of course).
That mindset in place, you can use black-box testing (testers use no more information than users), white-box testing (testers use all information, source code, etc.), or something in between.(1) In my company we are more black-box, but even for products it is helpful to have a detailed understanding of implementations. This is true not necessarily from a development perspective, but to give ideas about where the complexities - and therefore, often, bugs - lie. Depending on the quality of your developers, the testers will need to be more or less capable of determining this on their own, as it is a rare programmer that can thoroughly test her own code.
Once you have settled what you are testing and what your customers are doing, then you will know if testers need to know SQL, procedures, triggers, etc. If you are delivering, for example, a hosted database solution, then your testers will have to know these things. If you use a traditional, non-custom database server on the back end of your delivered software package and you are a black-box shop, then your testers don't necessarily need to know anything about SQL at all - the software should be handling it. (That's not to say it's not helpful in debugging, test selection, etc., but I'd always rather have a good tester with no knowledge of my field, than an average tester with coincidental domain knowledge.)
Looking at the question again, if you are just asking from a personal skills perspective - then yes, it is always a good thing to learn something else, and it will undoubtedly come in handy eventually. :)
1) I have a strong preference toward black-box testing, but there are plenty of arguments for white-box as well, so it would be worthwhile to review the differences if you're in a position to determine overall testing strategy on a project.
Shouldn't know much, but if you know how to write queries (you understand the language) you know stored procedures and triggers :)
You don't really need to know anything about db administration tho.
I've never expected a tester to have any real knowledge of sql... Unless they were testing a database implementation ;)
My feeling is that testers should be on par with end users plus certain other knowledge. The "other knowledge" is a bit ambiguous based on the project needs but usually is one or more of the following:
Has detailed knowledge of the business plan and exactly what the program is supposed to do. In this case they need to work hand in hand with your business people.
Knows how to do edge case checking. For example, entering 1/1/1800 or 12/31/2040 in a birthdate field. Is this allowed? Other examples include entering negative amounts or even alpha characters in numeric entry fields.
Has experience doing what the users typically do. This might be simply from on the job training such as sitting next to a user for a week or two every so often.
Has experience or knowledge of tools of the trade. There are a ton of testing tools out there which allow recording and playback of testing scenarios. I would expect a tester to be proficient in those tools.
Now, sometimes running simple sql queries is part of the job. For example, you might have to verify a workflow process that kicks off automated systems. In this case, being able to execute sql and reading the results is important.
However, those queries are usually provided to the testing staff by development or a dba. Again, most people can be trained on how to copy / paste, maybe change a parameter, and execute a query while interpreting results in less than an hour.
Obviously, it depends on your unique situation, as any software shop will do business differently than elsewhere.
I've worked with testers who knew nothing of SQL. We trained them to simply execute the sproc tests we wrote and notice if they changed. They would write up the logic for the test cases, and we'd do the code implementation.
So I'd say the testers could know nothing of SQL if your situation would allow them to merely be trained to do whatever tasks they're really needed for.
However, in an ideal world, I'd vote to say the testers should have a good-enough knowledge of code or SQL to write their own tests themselves, though this may introduce layers of arguments in interfacing when you have 2 testing teams (1 strict dev, 1 strict testing).
You will be a better tester if you know SQL. You don't need to know it in depth, but it's useful to know for a number of reasons:
You'll be able to do simple queries
to check the data that is displayed
on the screen is correct with
respect to the database, without having to go to a developer to check. This saves time.
You can start to analyse a bug
without having to go to a developer,
for instance, if the data in the
database is incorrect, if it is
displayed incorrectly. This will
save time, and help you to target
your tests more effectively. This will help you find bugs.
You'll understand (a little) of the
job of the developer, and you can
start to understand whats possible
and what is not. This will help you with your relationship with the developers.
You can improve the quality of your
testing: in the past, I've written
simple Excel spreadsheets which
calculated values based upon
database queries (you are
independently checking the
developers work). This helps the overall quality of the product.
The principle is that your core competence is testing, but it is always a good idea to know a little of all of the surrounding competences: A tester should know a little development, a analysis, a project management, but not to the same depth. In the same way, a developer should know a little about how a tester works (AND analysis and project management).
You don't NEED to know SQL, but it's a good idea. The same applies to triggers and stored procedures.

Common web problems where Neural Networks could help

I was wondering if you creative minds out there could think of some situations or applications in the web environment where Neural Networks would be suitable or an interesting spin.
Edit: Some great ideas here. I was thinking more web centric. Maybe bot detectors or AI in games.
To name a few:
Any type of recommendation system (whether it's movies, books, or targeted advertisement)
Systems where you want to adapt behaviour to user preferences (spam detection, for example)
Recognition tasks (intrusion detection)
Computer Vision oriented tasks (image classification for search engines and indexers, specific objects detection)
Natural Language Processing tasks (document/article classification, again search engines and the like)
The game located at 20q.net is one of my favorite web-based neural networks. You could adapt this idea to create a learning system that knows how to play a simple game and slowly learns how to beat humans at it. As it plays human opponents, it records data on game situations, the actions taken, and whether or not the NN won the game. Every time it plays, win or lose, it gets a little better. (Note: don't try this with too simple of a game like checkers, an overly simple game can have every possible game/combination of moves pre-computed which defeats the purpose of using the NN).
Any sort of classification system based on multiple criteria might be worth looking at. I have heard of some company developing a NN that looks at employee records and determines which ones are the least satisfied or the most likely to quit.
Neural networks are also good for doing certain types of language processing, including OCR or converting text to speech. Try creating a system that can decipher capchas, either from the graphical representation or the audio representation.
If you screen scrap or accept other sites item sales info for price comparison, NN can be used to flag possible errors in the item description for a human to then eyeball.
Often, as one example, computer hardware descriptions are wrong in what capacity, speed, features that are portrayed. Your NN will learn that generally a Video card should not contain a "Raid 10" string. If there is a trend to add Raid to GPUs then your NN will learn this over time by the eyeball-er accepting an advert to teach the NN this is now a new class of hardware.
This hardware example can be extended to other industries.
Web advertising based on consumer choice prediction
Forecasting of user's Web browsing direction in micro-scale and very short term (current session). This idea is quite similar, a generalisation, to the first one. A user browsing Web could be proposed with suggestions with other potentially interesting websites. The suggestions could be relevance-ranked according to prediction calculated in real-time during user's activity. For instance, a list of proposed links or categories or tags could be displayed in form of a cloud and font size indicates rank score. Each and every click a user makes is an input to the forecasting system, so the forecast is being constantly refined to provide user with as much accurate suggestions, in terms of match against user's interest, as possible.
Ignoring the "Common web problems" angle request but rather "interesting spin" view.
One of the many ways that a NN can be viewed/configured, is as a giant self adjusting, multi-input, multi-output kind of case flow control.
So when you want to offer match ups that are fuzzy, (not to be confused directly with fuzzy logic per se, which is another area of maths/computing) NN may offer a usable alternative.
So to save energy, you offer a lift club site, one-offs or regular trips. People enter where they are, where they want to go and at what time. Sort by city and display in browse control.
Using a NN you could, over time, offer transport owners to transport seekers by watching what owners and seekers link up. As a owner may not live in the same suburb that a seeker resides. The NN learns over time what variances in owners, seekers physical location difference appear to be acceptable. So it can then expand its search area when offering a seekers potential owners.
An idea.
Search! Recognize! Classify! Basically everything search engines do nowadays could benefit from a dose of neural networks and fuzzy logic. This applies in particular to multimedia content (e.g. content-indexing images and videos) since that's where current search technologies are lagging behind.
One thing that always amazes me is that we still don't have any pseudo-intelligent firewalling technology. Something that says "hey his range of urls is making too much requests when they are not supposed to", blocks them, and sends a report to an administrator. That could be done with a neural network.
On the nasty part of things, some virus makers could find lucrative uses to neural networks. Adaptative trojans that "recognized" credit card numbers on a hard drive (instead of looking for certain cookies) or that "learn" how to mask themselves from detectors automatically.
I've been having fun trying to implement a bot based on a neural net for the Diplomacy board game, interacting via DAIDE protocols. It turns out to be extremely tricky, so I've turned to XCS to simplify the problem.
Suppose EBay used neural nets to predict how likely a particular item was to sell; predict what the best day to list items of that type would be, suggest a starting price or "buy it now price"; or grade your description based on how likely it was to attract buyers? All of those could be useful features, if they worked well enough.
Neural net applications are great for representing discrete choices and the whole behavior of how an individual acts (or how groups of individuals act) when mucking around on the web.
Take news reading for instance:
Back in the olden days, you picked up usually one newspaper (a choice), picked a section (a choice), scanned a page and chose an article (a choice), and read the basics or the entire article (another choice).
Now you choose which news site to visit and continue as above, but now you can drop one paper, pick up another, click on ads, change sections, and keep going with few limits.
The whole use of the web and the choices people make based on their demographics, interests, experience, politics, time of day, location, etc. is a very rich area for NN application. This is especially relevant to news organizations, web page design, ad revenue, and may even be an under explored area.
Of course, it's very hard to predict what one person will do, but put 10,000 of them that are the same age, income, gender, time of day, etc. together and you might be able to predict behavior that will lead to better designs. Imagine a newspaper (or even a game) that could be scaled to people's needs based on demographics. An ad man's dream !
How about connecting users to the closest DNS, and making sure there are as few bounces as possible between the request and the destination?
Friend recommendation in social apps (Linkedin,facebook,etc)

Trialware/licensing strategies [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I wrote a utility for photographers that I plan to sell online pretty cheap ($10). I'd like to allow the user to try the software out for a week or so before asking for a license. Since this is a personal project and the software is not very expensive, I don't think that purchasing the services of professional licensing providers would be worth it and I'm rolling my own.
Currently, the application checks for a registry key that contains an encrypted string that either specifies when the trial expires or that they have a valid license. If the key is not present, a trial period key is created.
So all you would need to do to get another week for free is delete the registry key. I don't think many users would do that, especially when the app is only $10, but I'm curious if there's a better way to do this that is not onerous to the legitimate user. I write web apps normally and haven't dealt with this stuff before.
The app is in .NET 2.0, if that matters.
EDIT: You can make your current licensing scheme considerable more difficult to crack by storing the registry information in the Local Security Authority (LSA). Most users will not be able to remove your key information from there. A search for LSA on MSDN should give you the information you need.
Opinions on licensing schemes vary with each individual, more among developers than specific user groups (such as photographers). You should take a deep breath and try to see what your target user would accept, given the business need your application will solve.
This is my personal opinion on the subject. There will be vocal individuals that disagree.
The answer to this depends greatly on how you expect your application to be used. If you expect the application to be used several times every day, you will benefit the most from a very long trial period (several month), to create a lock-in situation. For this to work you will have to have a grace period where the software alerts the user that payment will be needed soon. Before the grace period you will have greater success if the software is silent about the trial period.
Wether or not you choose to believe in this quite bold statement is of course entirely up to you. But if you do, you should realize that the less often your application will be used, the shorter the trial period should be. It is also very important that payment is very quick and easy for the user (as little data entry and as few clicks as possible).
If you are very uncertain about the usage of the application, you should choose a very short trial period. You will, in my experience, achieve better results if the application is silent about the fact that it is in trial period in this case.
Though effective for licensing purposes, "Call home" features is regarded as a privacy threat by many people. Personally I disagree with the notion that this is any way bad for a customer that is willing to pay for the software he/she is using. Therefore I suggest implementing a licensing scheme where the application checks the license status (trial, paid) on a regular basis, and helps the user pay for the software when it's time. This might be overkill for a small utility application, though.
For very small, or even simple, utility applications, I argue that upfront payment without trial period is the most effective.
Regarding the security of the solution, you have to make it proportional to the development effort. In my line of work, security is very critical because there are partners and dealers involved, and because the investment made in development is very high. For a small utility application, it makes more sense to price it right and rely on the honest users that will pay for the software that address their business needs.
There's not much point to doing complicated protection schemes. Basically one of two things will happen:
Your app is not popular enough, and nobody cracks it.
Your app becomes popular, someone cracks it and releases it, then anybody with zero knowledge can simply download that crack if they want to cheat you.
In the case of #1, it's not worth putting a lot of effort into the scheme, because you might make one or two extra people buy your app. In the case of #2, it's not worth putting a lot of effort because someone will crack it anyway, and the effort will be wasted.
Basically my suggestion is just do something simple, like you already are, and that's just as effective. People who don't want to cheat / steal from you will pay up, people who want to cheat you will do it regardless.
If you are hosting your homepage on a server that you control, you could have the downloadable trial-version of your software automatically compile to a new binary every night. This compile will replace a hardcoded datetime-value in your program for when the software expires. That way the only way to "cheat" is to change the date on your computer, and most people wont do that because of the problems that will create.
Try the Shareware Starter Kit. It was developed my Microsoft and may have some other features you want.
http://msdn.microsoft.com/en-us/vs2005/aa718342.aspx
If you are planning to continue developing your software, you might consider the ransom model:
http://en.wikipedia.org/wiki/Street_Performer_Protocol
Essentially, you develop improvements to the software, and then ask for a certain amount of donations before you release them (without any DRM).
One way to do it that's easy for the user but not for you is to hard-code the expiry date and make new versions of the installer every now and then... :)
If I were you though, I wouldn't make it any more advanced than what you're already doing. Like you say it's only $10, and if someone really wants to crack your system they will do it no matter how complicated you make it.
You could do a slightly more advanced version of your scheme by requiring a net connection and letting a server generate the trial key. If you do something along the lines of sign(hash(unique_computer_id+when_to_expire)) and let the app check with a public key that your server has signed the expiry date it should require a "real" hack to bypass.
This way you can store the unique id's serverside and refuse to generate a expiry date more than once or twice. Not sure what to use as the unique id, but there should be some way to get something useful from Windows.
I am facing the very same problem with an application I'm selling for a very low price as well.
Besides obfuscating the app, I came up with a system that uses two keys in the registry, one of which is used to determine that time of installation, the other one the actual license key. The keys are named obscurely and a missing key indicates tampering with the installation.
Of course deleting both keys and reinstalling the application will start the evaluation time again.
I figured it doesn't matter anyway, as someone who wants to crack the app will succeed in doing so, or find a crack by someone who succeeded in doing so.
So in the end I'm only achieving the goal of making it not TOO easy to crack the application, and this is what, I guess, will stop 80-90% of the customers from doing so. And afterall: as the application is sold for a very low price, there's no justification for me to invest any more time into this issue than I already have.
just be cool about the license. explain up front that this is your passion and a child of your labor. give people a chance to do the right thing. if someone wants to pirate it, it will happen eventually. i still remember my despair seeing my books on bittorrent, but its something you have to just deal with. Don't cave to casual piracy (what you're doing now sounds great) but don't cripple the thing beyond that.
I still believe that there are enough honest people out there to make a for-profit coding endeavor worth while.
Don't have the evaluation based on "days since install", instead do number of days used, or number of times run or something similar. People tend to download shareware, run it once or twice, and then forget it for a few weeks until they need it again. By then, the trial may have expired and so they've only had a few tries to get hooked on using your app, even though they've had it installed for a while. Number of activation/days instead lets them get into a habit of using your app for a task, and also makes a stronger sell (i.e. you've used this app 30 times...).
Even better, limiting the features works better than timing out. For example, perhaps your photography app could limit the user to 1 megapixel images, but let them use it for as long as they want.
Also, consider pricing your app at $20 (or $19.95). Unless there's already a micropayment setup in place (like iPhone store or XBoxLive or something) people tend to have an aversion to buying things online below a certain price point (which is around $20 depending on the type of app), and people assume subconciously if something is inexpensive, it must not be very good. You can actually raise your conversion rate with a higher price (up to a point of course).
In these sort of circumstances, I don't really think it matters what you do. If you have some kind of protection it will stop 90% of your users. The other 10% - if they don't want to pay for your software they'll pretty much find a way around protection no matter what you do.
If you want something a little less obvious you can put a file in System32 that sounds like a system file that the application checks the existence of on launch. That can be a little harder to track down.

Resources