Bayesian Network for Spam Filtering - spam-prevention

I want to use Bayesian Network mechanism for spam filtering. How do you think it should look a proper topology of the network? What about naive Bayes model? (The naive Bayes model is sometimes called a Bayesian classifier)

If you're new to spam filtering, it'd be a good idea to start with something simple like a naive Bayesian classifier. That way you get familiar with the issues involved in handling the data (reading the email, classifying it, storing your lexicon, etc.) without getting too bogged down in the actually classification code. Once you have the basics of your program working, you can go on to more advanced types of filtering.
I found the discussion in the book Ending Spam to be quite helpful.

Related

Does ALICE use supervised or unsupervised machine learning

I am wondering what type of machine learning it uses and if someone could explain it to me. I have researched the different types and am unable to disngtuish what type it is due to my lack of knowledge.
While it's hard to say for sure, the fact that they're using deep learning on GPUs - pointing towards neural networks [1] seems to suggest that they're using a combination of unsupervised and supervised learning. The latter for bootstrapping the bot, and the former to learn on the job.
[1] http://www.existor.com/ai-parallel
call it weakly-supervised learning since we do not have exact labels but have intended document types. I dont know exaclty Cleverbot but state of the art takes large amount of documents and find models sequence relations of words by using Recurrent Neural Networks and LSTM in particular. Sometimes before it was Hidden Markov Models but now Deep Learning changed the game. If you search NLP with Recurrent Neural Network Google blows information.

per-requisites listed against each of the A.I. field

Yes A.I. field is very vast field. I have gone through the wiki page of A.I. and read about the different fields of it. I think any enthusiastic beginner can choose one of those fields based on his/her interest to get into A.I. But before getting in it is always good to have the per-requisites of the appropriate field.
It is very useful to have those per-requisites listed against each of the A.I. field to any beginner of A.I (like me) so that he can refresh his knowledge in them once and get started in actual field. Do someone list them here please?
Thanks
This question does not have a simple answer. The degree of knowledge required depends on how in depth you want to go into the applicable field.
To use a simple neural network API for example, there are hardly any prerequisites. To know whether a neural network will work for a specific problem or to write your own neural network, you will need to have at least high-school level maths knowledge to understand the internals and thus limitations of a neural network (or you can try to memorize it all). To be able to argue and understand arguments of specific approaches, you will probably need college level maths.
If you want to learn more about AI and the different fields therein (and get some idea of the requirements), I suggest you take an introductory AI course or a course in the sub-field that interests you. Coursera.org is a decent site for such courses (and it's free). Many courses will give you a list of prerequisites before you sign up.
From my experience, the main prerequisites to many of the fields of AI are Statistics, Linear Algebra or Calculus.
A decent understanding of data structures and algorithms is also very important for most fields of AI (and programming in general).
I think rather than having someone put a giant table of all the prerequisites here, just select a sub-field that seems interesting, take an introductory course, and see if you understand it. You can always learn the prerequisites.
If you don't have high-school maths knowledge, it might be a good idea to start with a maths course or two before you think about AI.

How to make bots to learn from experience

I am writing bot for one rts game.
I am using fuzzy logic to evaluate current position (mine and enemies') and to issue commands.
I have couple fuzzy variables: military_buildings, civilian_building, army_power, enemy_power and distance. I also have couple fuzzy linguistic values like VERY_GOOD, GOOD, NORMAL, BAD, VERY_BAD.
My next task is to make bots to learn, to avoid to all behave on same way. Any advice or idea how to solve this?
To use GA for tuning parameters (but I don't know ratings of players so I don't know if bot wins over a weak player or loses to a strong player).
Does anyone have experience with similar problems (I can change implementation and replace fuzzy logic if there is easier way to learn bots from experience)?
Have a look at reinforcement learning. Here are a quick preview and a book that can help you.
Based on your description, this is what I'd use :)
The idea of using GAs to tune the parameters to Fuzzy Linguistic Variables is a good one (I wish I thought of it!); the fuzzy logic gives you a nice continuous response curve while the GA will search through a large solution space. I think it's definitely a strategy worth pursuing; you should write up your results.
If I were you I would look at the AIIDE annual Starcraft Competition, it is sponsored in part by AAAI so there are some really high quality approaches to this problem. In particular if you are concerned with higher-level reasoning like resource management etc. Starcraft Competition Site Also, the competitors source code is all available open source so if you want to check out some other techniques I recommend it. FYI, most of the top competitors for this type of problem have historically used some variant of a Probabilistic State Machine Paper on Probabilistic FSMs, so this may make a good test bed for parameter tuning. FYI this is also the approach that some of the top Game AI middleware software uses for Game AI, like XAIT.

Where can I get very simple inroduction to all Artificial intelligent techniques with Real world Examples

I know that Artificial Intelligence field is very vast and there are many books on it. But i just want to know the any resource where i can get the simple inroduction to all Artificail Intelligence techniques like
It would like to have 1 or 2 page introduction of all techniques and their examples of how they can be applied or for what purpose they can be used. I am interested in
Backpropagation Algoritm
Hebbs Law
Bayesian networks
Markov Chain Models
Simulated Annealing
Tabu Search
Genetic Algorithms or Evolutionary Algos
Now there are many variants and more AI techniques. And each one have many books written on them. i am unable to decide which algos i can use unless i know what they are capable of doing.
So can i find 1-2 page inroduction of them with Application examples
Essentials of Metaheuristics covers several of these - I can't promise it'll cover all of them, but I know there's good stuff on simulated annealing and genetic algorithms in there. Probably at least a few of the others, but I'd have to re-download it to check. It's available for free download.
It can be a bit light on the theory, but it'll give you a straightforward description, some explanation of when you'd want to use each, and a lot of useful pseudocode.
Here's an image on local search (= tabu search without tabu) from the Drools Planner manual:
I am working on similar images for Greedy algorithms, brute force, branch and bound and simulated annealing.
As an example of Genetic Algorithms implementation I can give you this.
It's an API I developed for GA, with one implementation for each operator, and one concrete example problem solved ((good) Soccer Team among ~600 players with budget restriction). Its all setup so you run it with mvn exec:java and watch it evolving in the console output. But you can implement your own problem structure, or even other operators (crossing, mutating, selection) methods.

What Artificial Neural Network or 'Biological' Neural Network library/software do you use?

What do you use?
Fast Artificial Neural Network Library (FANN) is a free open source neural network library, which implements multilayer artificial neural networks in C with support for both fully connected and sparsely connected networks. Cross-platform execution in both fixed and floating point are supported. It includes a framework for easy handling of training data sets. It is easy to use, versatile, well documented, and fast. PHP, C++, .NET, Ada, Python, Delphi, Octave, Ruby, Prolog Pure Data and Mathematica bindings are available.
FannTool A graphical user interface is also available for the library.
There are a lot of different network simulators dependant on how detailed you want to do your sim, and what kind of network you want to simulate.
NEURON and GENESIS are good if you want to simulate full biological networks (Which I'm geussing you probably don't) even down to the behaviour of dendrites etc.
NEST and SPLIT and some others are good for doing population simulations where you create the population on a node-by-node basis and see what the whole population does. This is pretty much the 'industry' standard approach, and is used a lot in research and commercial applications, so there are worth looking into. I know that IBM use SPLIT for some of their research.
MIIND is good if you want to use differential equations to model what a population would do, but this approach is relatively new and computationally expensive (if very cool).
Not sure if that is exactly what you wanted!
(N.B. if you google any of the names in caps along with the word "simulator" you will end up at the relevant web page =)
Whenever I've wanted to play around with any data mining algorithm quickly, I just load up Weka. It's pretty complex but it implements a lot of algorithms (including neural networks) with a lot of customizability. Plus, it has some visualizations for NNs.
It is old, but I have always used NeuroShell 2 when not using my own code. Unfortunately, it is not free. I think The newer NeuroShells are designed only for predicting stocks.
If you're looking to experiment with deep learning, you should look into
Theano
Pylearn2 (which is based on Theano)

Resources