How to identify ideas and concepts in a given text - artificial-intelligence

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:
Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph?
It'd be great to be able to detect that the person has asked for a photograph of Mr Jones. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like:
Please, never send me a photo of Mr Jones.
Does anyone know where to start with this? Is it even possible?
I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great.
Thanks!

The best thing out there that might be useful to you is automatic sentiment analysis. This is used, for example, to judge whether, say, a customer review is positive or negative. I cannot give you direct pointers to available tools, but this is what you are looking for.
I must say, though, that this is a current hot topic in natural language processing and I’ve seen a number of papers at conferences. It’s definitely quite a complex matter and if you’re starting from scratch, it might take quite some time before you get the results that you want.

NLTK is not a bad framework for parsing natural language but beware that this is not a simple matter. Doing stuff like this is really research level programming.
A good thing that makes it much easier is if you have a very limited domain - say your application focuses on information about famous writers, then you can avoid some complexities of natural language like certain types of ambiguities.
Where to start? Good question. I don't know of any tutorials on the topic (and I presume you tried the Google option) but I'd imagine that iTunes U would have a course on the topic. If not I can post a link to a course I've done that mentions the subject and wasn't completely horrible: http://www.inf.ed.ac.uk/teaching/courses/inf2a/lecturematerials/index.html#lecture01

The problem that u tackle is very challenging.
I would start by first identifying the entities in the text (problem referred as Named Entity Recognition, google it), and then a I would try to identify concepts.
If want to roughly identify what is the text about, I suggest that you start by using WordNet and according to the words and their places in the hierarchy to identify the concepts involved.
If you want to produce a system which show real intelligence than you should start researching about resources such as CYC (OpenCYC) which will allow you to convert the sentences into FOL sentences.
This hardcore AI, approach to solving your problem. For simple chat bot, it would be easier to rely on simple statistical methods.
good luck

Related

per-requisites listed against each of the A.I. field

Yes A.I. field is very vast field. I have gone through the wiki page of A.I. and read about the different fields of it. I think any enthusiastic beginner can choose one of those fields based on his/her interest to get into A.I. But before getting in it is always good to have the per-requisites of the appropriate field.
It is very useful to have those per-requisites listed against each of the A.I. field to any beginner of A.I (like me) so that he can refresh his knowledge in them once and get started in actual field. Do someone list them here please?
Thanks
This question does not have a simple answer. The degree of knowledge required depends on how in depth you want to go into the applicable field.
To use a simple neural network API for example, there are hardly any prerequisites. To know whether a neural network will work for a specific problem or to write your own neural network, you will need to have at least high-school level maths knowledge to understand the internals and thus limitations of a neural network (or you can try to memorize it all). To be able to argue and understand arguments of specific approaches, you will probably need college level maths.
If you want to learn more about AI and the different fields therein (and get some idea of the requirements), I suggest you take an introductory AI course or a course in the sub-field that interests you. Coursera.org is a decent site for such courses (and it's free). Many courses will give you a list of prerequisites before you sign up.
From my experience, the main prerequisites to many of the fields of AI are Statistics, Linear Algebra or Calculus.
A decent understanding of data structures and algorithms is also very important for most fields of AI (and programming in general).
I think rather than having someone put a giant table of all the prerequisites here, just select a sub-field that seems interesting, take an introductory course, and see if you understand it. You can always learn the prerequisites.
If you don't have high-school maths knowledge, it might be a good idea to start with a maths course or two before you think about AI.

How to make bots to learn from experience

I am writing bot for one rts game.
I am using fuzzy logic to evaluate current position (mine and enemies') and to issue commands.
I have couple fuzzy variables: military_buildings, civilian_building, army_power, enemy_power and distance. I also have couple fuzzy linguistic values like VERY_GOOD, GOOD, NORMAL, BAD, VERY_BAD.
My next task is to make bots to learn, to avoid to all behave on same way. Any advice or idea how to solve this?
To use GA for tuning parameters (but I don't know ratings of players so I don't know if bot wins over a weak player or loses to a strong player).
Does anyone have experience with similar problems (I can change implementation and replace fuzzy logic if there is easier way to learn bots from experience)?
Have a look at reinforcement learning. Here are a quick preview and a book that can help you.
Based on your description, this is what I'd use :)
The idea of using GAs to tune the parameters to Fuzzy Linguistic Variables is a good one (I wish I thought of it!); the fuzzy logic gives you a nice continuous response curve while the GA will search through a large solution space. I think it's definitely a strategy worth pursuing; you should write up your results.
If I were you I would look at the AIIDE annual Starcraft Competition, it is sponsored in part by AAAI so there are some really high quality approaches to this problem. In particular if you are concerned with higher-level reasoning like resource management etc. Starcraft Competition Site Also, the competitors source code is all available open source so if you want to check out some other techniques I recommend it. FYI, most of the top competitors for this type of problem have historically used some variant of a Probabilistic State Machine Paper on Probabilistic FSMs, so this may make a good test bed for parameter tuning. FYI this is also the approach that some of the top Game AI middleware software uses for Game AI, like XAIT.

looking for a good project to work on as my graduation project in the university that involves Ai / Machine Learning, please help me

I need help to chose a project to work on for my master graduation, The project must involve Ai / Machine learning or Business intelegence.. but if there is any other suggestion out of these topics it is Ok, please help me.
One of the most rapid growing areas in AI today is Computer Vision. There are many practical needs where the results of your Master Thesis can be helpful. You can try research something like Emotion Detection, Eye-Tracking, etc.
An appropriate work for a MS in CS in any good University can highlight the current status of research on this field, compare different approaches and algorithms. As a practical part, it makes also a lot of fun when your program recognizes your mood properly :)
Netflix
If you want to work more on non trivial datasets (not google size, but not trivial either and with real application), with an objective measure of success, why not working on the netflix challenge (the first one) ? You can get all the data for free, you have many papers on it, as well as pretty good way to compare your results vs other peoples (since everyone used exactly the same dataset, and it was not so easy to "cheat", contrary to what happens quite often in the academic literature). While not trivial in size, you can work on it with only one computer (assuming it is recent enough), and depending on the type of algorithms you are using, you can implement them in a language which is not C/C++, at least for prototyping (for example, I could get decent results doing things entirely in python).
Bonus point, it passes the "family" test: easy to tell your parents what you are working on, which is always a pain in my experience :)
Music-related tasks
A bit more original: something that is both cool, not trivial but not too complicated in data handling is anything around music, like music genre recognition (classical / electronic / jazz / etc...). You would need to know about signal processing as well, though - I would not advise it if you cannot get easy access to professors who know about the topic.
I can use the same answer I used on a previous, similar question:
Russ Greiner has a great list of project topics for his machine learning course, so that's a great place to start.
Both GAs and ANNs are learners/classifiers. So I ask you the question, what is an interesting "thing" to learn? Maybe it's:
Detecting cancer
Predicting the outcome between two sports teams
Filtering spam
Detecting faces
Reading text (OCR)
Playing a game
The sky is the limit, really!
Since it has a business tie in - given some input set determine probable business fraud from the input (something the SEC seems challenged in doing). We now have several examples (Madoff and others). Or a system to estimate investment risk (there are lots of such systems apparently but were any accurate in the case of Lehman for example).
A starting point might be the Chen book Genetic Algorithms and Genetic Programming in Computational Finance.
Here's an AAAI writeup of an award to the National Association of Securities Dealers for a system thatmonitors NASDAQ insider trading.
Many great answers posted already, but I wanted to add my 2 cents.There is one hot topic in which big companies all around are investing lots of resources into, and is still a very challenging topic with lots of potential: Automated detection of fake news.
This is even more relevant nowadays where most of us are connecting though social media and there's a huge crisis looming over.
Fake news, content removal, source reliability... The problem is huge and very exciting. It is as I said challenging as it can be seen from many perspectives (from analising images to detect fakes using adversarial netwotks to detecting fake written news based on text content (NLP) or using graph theory to find sources) and the possbilities for a research proyect are endless.
I suggest you read some general articles (e.g this or this) or have a look at research articles from the last couple of years (a quick google seach will throw you a lot of related stuff).
I wish I had the opportunity of starting over a project based on this topic. I think it's going to be of the upmost relevance in the next few years.

Have you tinkered with Rel?

I would like to hear opinions or peoples experiences regarding Rel. Is it destined for the dustbin, or is it the next big thing in programming? I haven't tried doing anything with it yet (and it looks like you really can't at this point), but I'm intrigued by a few of the concepts discussed in it. Notably:
Removal of nulls completely from the data handling part of the language.
No need for mapping types between the language and the data storage.
Nesting tables
Complete separation of design and implementation.
Thoughts?
I think it is intended as an aid to teaching the pure relational model, not as a competitor to SQL DBMSs for "real work" in the short or medium term. However, Date and Darwen make a compelling case for the proper implementation of the relational model in their book The Third Manifesto. Maybe one day someone will produce a successful product based on it. After all, Oracle was a very small, niche company once!
Right, I agree with you Tony. The interesting thing for me, though, is that Rel is a somewhat working implementation with the understanding that it is essentially a working version of Tutorial D. The thought being that some well funded enterprise takes the research and decides that something like an Industrial D might be worthwhile.
Maybe I'm wrong here, but I get the impression that while Tutorial D is primarily a data language, it has the potential to move into the application space as well. That seems pretty ground-breaking to me. Of course, after reading some of the stuff from Date, Darwen, Pascal, and others, it seems like the language may have the goal of supplanting object oriented programming in general. Right now, OO appears to rule the world of programming. Rel would make available an alternative view on programming in general.
So I guess what I'm curious about is whether this project has legs that leads to other products or if people think it's going to be just a historical curiosity.
Nearly four years on from the OP. I came across Rel recently, and it does have potential for what I am doing. It is more clearly established as a teaching language, but the implementation is now quite solid, though still a little fragile in syntax. It does have potential and I hope that this potential will be realised. Unfortunately this is a similar statement to that made four years ago, so if it is to be realised, it is evidently a very slow burner. Still most research efforts take about 10 years to become embedded in product, so there is still hope.

Asking to see employer's code/database in an interview

I've been asked to write code/design things in an interview. Sometimes even to provide code samples. Very reasonable and very wise (always surprised when this DOESN'T happen)
I had a job a year or so back where the code was so awful that I would not have taken the job, if I'd seen the mess I had to deal with ahead of time. And I can't tell you how many horrendous databases I've had to work with.
Is it out of the question for me to ask them to provide a code sample and to view their database design? Assuming I'd be happy to sign an NDA, part of me feels it would insane to take a job without examining the codebase or database I'd be working with.
Anyone done this?
Update
This would be something I would ask later in the interview process, if things were proceeding well and I felt an offer was forthcoming.
It's also in the context of working in a small shop or small project as my preference is to avoid places that use phrases like "get a developer off the floor"
You can definitely ask. The answer may be "No," but nobody should consider that to be a bad or inappropriate question.
If they won't show you the code, you should definitely take that into account when you decide whether you want to accept an offer. I would take it as a sign that at least one of the following things is true:
The code is so horrible that they know you'll run away screaming.
The company has an ultra-secretive trust-nobody culture (which I would hate).
The company thinks they have such amazing code that just glancing at it would turn you into a superstar competitor. (In other words, they're self-deluded morons.)
They have glaring security holes that they hope to keep secret.
The people who are interviewing you don't know how to get the code themselves. (In which case you are not talking to the right people.)
I'd be more interested in seeing the company's systems - i.e. test framework, release process, autobuilds.... The presence or absence of those would tell me a lot more than a couple hundred lines of code.
I did ask: "Can I see some code and talk to programmers working here?"
The employer replied: "Sure! Come you can directly talk to our lead programmer of our information system!"
What an honor!
they showed me concept papers
I could talk to the lead programmer
they showed me a small part of a very new project telling: "this is just a prototype, direct3d is so sketchy, that's why this code is so messy"
It turned out that:
the lead programmer left the day I arrived
the software he had the lead, was a big mess
somehow I ended up spending 50% of my time, fighting against the mess
None of the candidates we have interviewed have ever asked that; however, many of them have been co-ops/interns in the company so they are familiar with our code...
Having said that, it is highly unlikely we will show our code to ANY candidate, regardless of an NDA. I would be happy to answer questions about what technologies we use, what system we use for revisions, practices around, etc. Actual code though? No.
Also in a large enough system (as ours is) someone can just show you the "best" code there is...and you would be where you started :) As for a database design...both companies I have worked at have had enormously large databases (university, corporate company)...so that wouldn't work either.
I've asked this in interviews with Xerox PARC, a startup, and Yahoo.
At PARC they sat me at a workstation with the code I'd take over if hired, went over the structure of the codebase super-briefly, and left me alone for around 20 minutes. This was enough to get an idea whether I could stand working with it, though I'd have liked some more time, like an hour total. Afterward I asked about a design decision that seemed dubious, and we chatted about the design and the style in general. This didn't just tell me more about the job, it told them more about me: did I explore their code top-down or bottom-up, what did I pick up on or ask about, etc. Valuable all around.
At the startup, they set up a separate meeting on another day, bringing in the author of the code (who wasn't an employee); we sat down at a laptop and went over things together. It was an unusual request to them and I think I had to sign a new NDA. This was once again worthwhile: my earlier interviews hadn't really cleared up what this fancy AI language was all about or what they'd want me to do with it, and sitting down with some concrete code blew away a lot of fog.
At Yahoo, I didn't see much of anything; I don't recall just what their response was. If I'd seen the code I ended up dealing with I might have had second thoughts (though it worked out all right in the end). (Both of the above codebases that I did get to see seemed generally nicer; the PARC one was open-sourced later on.)
In all these cases I shared some code of my own with them.
If you are going to do this then I think you need to give them a little warning so they can prepare an NDA and get an apppriate environment set up in which you can see it. Also be prepared to dedicate a little time to understanding why the code is in the shape it is.
If you turn up at your first interview and say, right, can I see the code, all but a very few people will say no. And not necessarily because they are evil and don't want to show you, but because it just isn't as simple as saying yes.
In my experience as a recruiter for a large software company it would have taken a considerable amount of time for us to disclose enough detail of the code and internally developed frameworks for any candidate - however bright - to be able to make a meaningful judgement of its pros and cons. We would only contemplate doing that if we were serious about hiring them.
If I were asked that question I woul say yes, come back another time and we'll arrange something. I would get a trustworthy developer off the floor and have them bring a laptop to the next interview and show a little of the code.
The reality is pretty much any software project which is of a reasonable size and has been in existence for more than one release will have some horrible scary rubbish in it.
Similarly to some of the other responses, I've never had a candidate ask to see our code. Even if they did I've be very careful to do so and most likely would not. As Swati mentions, pretty much any non-trivial system will have sections that look good so even seeing the code won't help that much.
Better than looking at actual code is the Joel Test. Basically it is 12 yes or no questions that you can ask an employer. The more yes answers, the better the work environment is expected to be. It's obviously not a hard and fast "rule", but it would seem to indicate those companies that take code (and coders) seriously.
I can't think a reason for not showing some classes or talking about the architecture they're using. From my point of view it's like asking them to show you where are you going to work (room, table, chairs, teammates...).
Anyhow, asking for it will show them you're interested in best practices and also that you're not desperate about finding a job at any price, and don't know how this can hurt.
Go to open source projects. There you don't have to ask for permission to see the code.
It can't hurt to ask and this is a very good idea which I am going to add to my checklist of questions to ask employers.
An interesting idea, but I don't know how many companies would go for it. I know we can't do it where I work now.
I think the biggest problem you're going to have with this is that I have found that a lot of people take offense to people not liking their code. It's like criticising someone's therapist, it's just not a good idea to be an outsider and do it. Seeing the code and then not taking the job could give you the reputation that you're arrogant or not good enough to work on the code and that's why you didn't take the job. It might save you from getting job you don't want, but it could give you a negative reputation down the line. I live in a sizable city, but the IT people still know one another and word spreads. People in our field have egos, and it's easier to trash somoene else's reputation than it is to admit that code you wrote isn't up to par.
Even if they showed you some code, would that be sufficient for you to come to a rough conclusion about the quality of code that you would be spending time with? For example, at my previous place, one of their products was a large e-banking middleware application. The core of the application was in C++ and designed and written in a great way. However, the extensions (which by far covered a large part of the application and its various different versions), which were in C++ too, that were mostly coded by the less-experienced and less-knowledgeable developers were a pile of crappy code (which I had to fix and work with or write from scratch at times) slapped together to just somehow work. If I had asked them to show me a snippet of the code during the interview, and they had shown me some of the core stuff (the extension code actually mostly contained the client-specific business logic so it wouldn't make much sense without the business-domain knowledge, etc), I would've thought that the overall quality of the code is good (which was not completely the case).
More important than to ask for code snippets, I believe, is to ask them for which source code control product they use (run away from companies that answer "Visual SourceSafe") and which methodology they use: "Agile" or "Scrum" sends positive signals, CMMI usually means company loves bureaucratic processes, if they give you a "huh?" then you're warned ;)
I think this is a great idea; however, as an employer, I would be hesitant -- even with an NDA -- to provide an interview candidate samples of real, working code unless I was pretty sure I wanted to hire the person.
The problem is they will show you a little bit of code, but each of their programmers will write code in a different way. You are unluckily to have to work on the part of the code base that is well written.
Asking to see their coding standard and how they enforce it is more likely to be of use.

Resources