Is there a database with (Integer) Linear Programming problems/solutions? - dataset

There are lots of databases of training sets where one can test one's machine learning algorithms. Is there also one where I could test my (Integer) Linear Programming Solver?

For LP problems there is NETLIB. MIP problems can be found in MIPLIB. Hans Mittelmann has a collection here.

Related

Planning n jobs on m Machines with Linear Programming

I heard that you can use Linear Programming for planning problems. I dont really understand how to do that because Linear Programming is optimal and planning in large scale (for example planning n jobs on m machines) has exponential difficulty.
So how can i solve for example a problem 100 jobs and 10 machines using linear programming? Can you give me some explanation or further reading?
So how can i solve for example a problem 100 jobs and 10 machines using linear programming?
Generally, you can't. That isn't the sort of planning problem that Linear Programming (LP) is applicable to.
In an LP problem, you have a set of variables that you want to solve for. You have a set of linear inequalities that represent the constraints on those variables. And, you have a linear function of those variables (i.e., no exponents, no division, no "if-then-else", etc) that represents the cost (or benefit) of a given solution.
If you have a problem like that, you can use LP to efficiently generate an optimal solution. Shop floor scheduling, like what you are asking about, isn't that kind of problem.
LP tends to lend itself to "higher level" planning. Like, how much of each product should I make in each factory? In such a problem, you'll often be able to specify the constraints as linear inequalities and the cost (or benefit) as a linear function, as you must do in order to make use of LP. Notice, I said "how much of each product..." and not "how many...". Because that's another limitation of LP -- the variables must be able to take on real values. If you need your solution to give integer solutions, you're looking at an Integer Programming (or Mixed Integer Programming) problem.

How mathematical is artificial intelligence as a focus?

Compared to mechanical engineering, computer engineering, or software engineering how do the mathematics compare? What should be mathematics that I should start focusing on learning now or should expect to learn if I want to become a researcher in the field or an industry expert? I am currently a senior in high school who is considering AI. Math doesn't scare me.
In AI one of the most important goals is to make computer act(and think!) like humans. For this purpose computers must learn models from observations(data) and act based on that model. This learning and prediction needs deep understanding of probability theory, statistics and stochastic processes as fundamental tools.
Today, probability and statistics are considered general mathematics like calculus and all undergraduate students are familiar with them, but you need to master them if your research field is in AI.
I would look into the following:
Probability - Bayesian Theory
Statistics - Data Interpretation, Graph Plotting, Graph Error Handling
Stochastic Theory
Entropy Theory (for finding degree of errant data)
Matrices and their computational formulae, use stochastic matrices
Since AI uses a lot of trees and graphs, a look into state space search and heuristic calculation would be quite useful..

Difference between Neural Network and Evolutionary algorithm

I have a good basis on Evolutionary Algorithms, so now i started to read about Artificial Neural Networks. I come across this tutorial on
http://www.ai-junkie.com/ann/evolved/nnt2.html,
showing how to use a ANN to evolve Tanks that collect mines. It uses a GA to evolve the input weights on each Neuron.
I know i could use GA (without the ANN) to solve the same problem. I already created a Tetris Bot using only GA to optimize the weights in the grid evaluation function (check my blog http://www.bitsrandomicos.blogspot.com.br/).
My question is: what's the conceptual/practical difference between using a ANN + GA in a situation where i could use GA alone? I mean, is my Tetris Bot a ANN?(I don't think so).
There are several related questions about this, but i couldn't find a answer:
Are evolutionary algorithms and neural networks used in the same domains?
When to use Genetic Algorithms vs. when to use Neural Networks?
Thanks!
A genetic algorithm is an optimization algorithm.
An artificial neural network is a function approximator. In order to approximate a function you need an optimization algorithm to adjust the weights. An ANN can be used for supervised learning (classification, regression) or reinforcement learning and some can even be used for unsupervised learning.
In supervised learning a derivative-free optimization algorithm like a genetic algorithm is slower than most of the optimization algorithms that use gradient information. Thus, it only makes sense to evolve neural networks with genetic algorithms in reinforcement learning. This is known as "neuroevolution". The advantage of neural networks like multilayer perceptrons in this setup is that they can approximate any function with arbitrary precision when they have a suffficient number of hidden nodes.
When you create a tetris bot you do not necessarily have to use an ANN as a function approximator. But you need some kind of function approximator to represent your bot's policy. I guess it was just simpler than an ANN. But when you want to create a complex nonlinear policy you could do that e. g. with an ANN.
alfa's answer is perfect. Here is just an image to illustrate what he said:
Meta-Optimizer = None (but could be)
Optimizer = Genetic Algorithm
Problem = Tetris Bot (e.g. ANN)
You use evolutionary algorithm if you yet don't know the answer but you are able to somehow rate candidates and provide meaningful mutations.
Neural network is great if you already have answers (and inputs) and you want to "train the computer" so it can "guess" the answers for unknown inputs. Also, you don't have to think a lot about the problem, the network will figure it out by itself.
Check this "game AI" example: https://synaptic.juancazala.com/#/
(note how simple it is, all you have to do is to give them enough training, you don't have to know a thing about game AI - and once it is good enough all you have to do is to "download" memory and run it when needed)
I'm not an expert, but based on what I know from the field..
An artificial neural network has a basis on neuroscience ultimately. It attempts to simulate/model its behavior through building a neuron-like structures in the algorithm. There is a strong emphasis on the academic nature of the problem than the result. From what I understand, its for this reason that ANN's are not very popular from an engineering standpoint. Statistical basis of machine learning (HMM's and Bayesian networks) produce better results.
In short, so as long as it has a nod towards some underlying neurosciency subject, it can be a ANN, even if it uses some form of GA.
If you use a GA, it is not necessarily an ANN.

Where can I find huge linear system datasets?

I have been searching the web, but I have not found huge linear systems datasets.
Do you know any web site where I can get one, let's say of size $100000 \times 100000$ or maybe a little bigger?
Perhaps the University of Florida Sparse Matrix Collection at http://www.cise.ufl.edu/research/sparse/matrices/ has what you are looking for.
The other answers to the question "Where can one obtain good data sets/test problems for testing algorithms/routines?" on the Computational Science SE may also be useful for you.

Measuring the performance of classification algorithm

I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account.
That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless.
An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training.
White papers, book pointers, and PDFs much appreciated!
You could use 10-fold Cross-validation for this. I believe it's pretty standard approach for classification algorithm performance evaluation.
The basic idea is to divide your learning samples into 10 subsets. Then use one subset for test data and others for train data. Repeat this for each subset and calculate average performance at the end.
As Mr. Brownstone said 10-fold Cross-Validation is probably the best way to go. I recently had to evaluate the performance of a number of different classifiers for this I used Weka. Which has an API and a load of tools that allow you to easily test the performance of lots of different classifiers.

Resources