large test data for knapsack problem - artificial-intelligence

large test data for knapsack problem - artificial-intelligence

i am researcher student. I am searching large data for knapsack problem. I wanted test my algorithm for knapsack problem. But i couldn't find large data. I need data has 1000 item and capacity is no matter. The point is item as much as huge it's good for my algorithm. Is there any huge data available in internet. Does anybody know please guys i need urgent.

You can quite easily generate your own data. Just use a random number generator and generate lots and lots of values. To test that your algorithm gives the correct results, compare it to the results from another known working algorithm.

I have the same requirement.
Obviously only Brute force will give the optimal answer and that won't work for large problems.
However we could pitch our algorithms against each other...
To be clear, my algorithm works for 0-1 problems (i.e. 0 or 1 of each item), Integer or decimal data.
I also have a version that works for 2 dimensions (e.g. Volume and Weight vs. Value).
My file reader uses a simple CSV format (Item-name, weight, value):
X229257,9,286
X509192,11,272
X847469,5,184
X457095,4,88
etc....
If I recall correctly, I've tested mine on 1000 items too.
Regards.
PS:
I ran my algorithm again the problem on Rosette Code that Mark highlighted (thank you). I got the same result but my solution is much more scalable than the dynamic programming / LP solutions and will work on much bigger problems

Related

Mathematical Idea with the purpose to store data more efficiently

dear reader
I have been thinking about how to store data efficiently since the beginning of my studies and while taking a shower, I came up with the following idea:
For example, you take a picture and convert that picture into 0(zeros) and 1(ones). Then you take this eternally long number and divide it by e.g. 10 then again by 10 and then again by 10 etc. and at the end you have a small number. Now the small number and the calculation path are stored and if someone wants to read the data, they only have to perform the inverse operation to get the result.
The idea is too good to be true --> my gut feeling tells me. But I would still like to know why this should not work?
Kind regards
Hello, dear reader
I have been thinking about how to store data efficiently since the beginning of my studies and while taking a shower, I came up with the following idea:
For example, you take a picture and convert that picture into 0(zeros) and 1(ones). Then you take this eternally long number and divide it by e.g. 10 then again by 10 and then again by 10 etc. and at the end you have a small number. Now the small number and the calculation path are stored and if someone wants to read the data, they only have to perform the inverse operation to get the result.
The idea is too good to be true --> my gut feeling tells me. But I would still like to know why this should not work?
Kind regards

Fun theorem. No bijection on the natural numbers can map every number to a smaller one. Proof by contradiction, consider F(F(1)).
Lots of ways to map numbers 1-1 to smaller numbers such that many map to smaller numbers. These are lossless compression algorithms. Most have the property that repeated application of the algorithm make the data larger, or leave it unchanged.
In your proposal, to the extent I understand it, you would have to store all the remainders of the division, which would be as large as the original data.

Optimizing People Placement - Large Data Set

I'm trying to figure out the best method / program to handle this computation to get the most people happy, ie the highest value for each person while still having all values be almost equal.
There are 24 people, 100 days and 4 people need to be selected for each day. All days must be full, ie the 24 people must be spread over 400 the slots with each person getting about 8 slots.
How can I create a program / algorithm that will allow the people to rank all 100 days in order of preference as well as the top 5 people they would prefer to be selected with. I was thinking that each day and each of the preferred people would get some sort of point value. Then the algorithm would run through the data set and find the combination that would yield the highest amount of people the happiest while still making everyone roughly even.
Is this easily possible using something like excel?
Thanks

Read up on "The Assignment Problem"; this is a well studied class of problems. Off the top of my head, the Hungarian Assignment method and the Stable Marriage/Stable Roommate method might be relevant.

You can solve this problem as a MILP using Solver as shown in this video and many others like it, but I am afraid that the build-in Solver may not allow enough binary variables for it to work. Get a feeling for how the problem work on a small scale and then download a nicer solver.

Help--100% accuracy with LibSVM?

Nominally a good problem to have, but I'm pretty sure it is because something funny is going on...
As context, I'm working on a problem in the facial expression/recognition space, so getting 100% accuracy seems incredibly implausible (not that it would be plausible in most applications...). I'm guessing there is either some consistent bias in the data set that it making it overly easy for an SVM to pull out the answer, =or=, more likely, I've done something wrong on the SVM side.
I'm looking for suggestions to help understand what is going on--is it me (=my usage of LibSVM)? Or is it the data?
The details:
About ~2500 labeled data vectors/instances (transformed video frames of individuals--<20 individual persons total), binary classification problem. ~900 features/instance. Unbalanced data set at about a 1:4 ratio.
Ran subset.py to separate the data into test (500 instances) and train (remaining).
Ran "svm-train -t 0 ". (Note: apparently no need for '-w1 1 -w-1 4'...)
Ran svm-predict on the test file. Accuracy=100%!
Things tried:
Checked about 10 times over that I'm not training & testing on the same data files, through some inadvertent command-line argument error
re-ran subset.py (even with -s 1) multiple times and did train/test only multiple different data sets (in case I randomly upon the most magical train/test pa
ran a simple diff-like check to confirm that the test file is not a subset of the training data
svm-scale on the data has no effect on accuracy (accuracy=100%). (Although the number of support vectors does drop from nSV=127, bSV=64 to nBSV=72, bSV=0.)
((weird)) using the default RBF kernel (vice linear -- i.e., removing '-t 0') results in accuracy going to garbage(?!)
(sanity check) running svm-predict using a model trained on a scaled data set against an unscaled data set results in accuracy = 80% (i.e., it always guesses the dominant class). This is strictly a sanity check to make sure that somehow svm-predict is nominally acting right on my machine.
Tentative conclusion?:
Something with the data is wacked--somehow, within the data set, there is a subtle, experimenter-driven effect that the SVM is picking up on.
(This doesn't, on first pass, explain why the RBF kernel gives garbage results, however.)
Would greatly appreciate any suggestions on a) how to fix my usage of LibSVM (if that is actually the problem) or b) determine what subtle experimenter-bias in the data LibSVM is picking up on.

Two other ideas:
Make sure you're not training and testing on the same data. This sounds kind of dumb, but in computer vision applications you should take care that: make sure you're not repeating data (say two frames of the same video fall on different folds), you're not training and testing on the same individual, etc. It is more subtle than it sounds.
Make sure you search for gamma and C parameters for the RBF kernel. There are good theoretical (asymptotic) results that justify that a linear classifier is just a degenerate RBF classifier. So you should just look for a good (C, gamma) pair.

Notwithstanding that the devil is in the details, here are three simple tests you could try:
Quickie (~2 minutes): Run the data through a decision tree algorithm. This is available in Matlab via classregtree, or you can load into R and use rpart. This could tell you if one or just a few features happen to give a perfect separation.
Not-so-quickie (~10-60 minutes, depending on your infrastructure): Iteratively split the features (i.e. from 900 to 2 sets of 450), train, and test. If one of the subsets gives you perfect classification, split it again. It would take fewer than 10 such splits to find out where the problem variables are. If it happens to "break" with many variables remaining (or even in the first split), select a different random subset of features, shave off fewer variables at a time, etc. It can't possibly need all 900 to split the data.
Deeper analysis (minutes to several hours): try permutations of labels. If you can permute all of them and still get perfect separation, you have some problem in your train/test setup. If you select increasingly larger subsets to permute (or, if going in the other direction, to leave static), you can see where you begin to lose separability. Alternatively, consider decreasing your training set size and if you get separability even with a very small training set, then something is weird.
Method #1 is fast & should be insightful. There are some other methods I could recommend, but #1 and #2 are easy and it would be odd if they don't give any insights.

Solving the problem of finding parts which work well with each other

I have a database of items. They are for cars and similar parts (eg cam/pistons) work better than others in different combinations (eg one product will work well with another, while another combination of 2 parts may not).
There are so many possible permutations, what solutions apply to this problem?
So far, I feel that these are possible approaches (Where I have question marks, something tells me these are solutions but I am not 100% confident they are).
Neural networks (?)
Collection-based approach (selection of parts in a collection for cam, and likewise for pistons in another collection, all work well with each other)
Business rules engine (?)
What are good ways to tackle this sort of problem?
Thanks

The answer largely depends on how do you calculate 'works better'?
1) Independent values
Assuming that 'works better' function f of x combination of items x=(a,b,c,d,...) and(!) that there are no regularities that can be used to decide if f(x') is bigger or smaller then f(x) knowing only x, f(x) and x' (which could allow to find the xmax faster) you will have to calculate f for all combinations at least once.
Once you calculate it for all combinations you can sort. If you will need to look up data in a partitioned way, using SQL/RDBMS might be a good approach (for example, finding top 5 best solutions but without such and such part).
For extra points after calculating all of the results and storing them you could analyze them statistically and try to establish patterns
2) Dependent values
If you can establish some regularities (and maybe you can) regarding the values the search for the max value can be simplified and speeded up.
For example if you know that function that you try to maximize is linear combination of all the parameters then you could look into linear programming
If it is not...

How do you solve the 15-puzzle with A-Star or Dijkstra's Algorithm?

I've read in one of my AI books that popular algorithms (A-Star, Dijkstra) for path-finding in simulation or games is also used to solve the well-known "15-puzzle".
Can anyone give me some pointers on how I would reduce the 15-puzzle to a graph of nodes and edges so that I could apply one of these algorithms?
If I were to treat each node in the graph as a game state then wouldn't that tree become quite large? Or is that just the way to do it?

A good heuristic for A-Star with the 15 puzzle is the number of squares that are in the wrong location. Because you need at least 1 move per square that is out of place, the number of squares out of place is guaranteed to be less than or equal to the number of moves required to solve the puzzle, making it an appropriate heuristic for A-Star.

A quick Google search turns up a couple papers that cover this in some detail: one on Parallel Combinatorial Search, and one on External-Memory Graph Search
General rule of thumb when it comes to algorithmic problems: someone has likely done it before you, and published their findings.

This is an assignment for the 8-puzzle problem talked about using the A* algorithm in some detail, but also fairly straightforward:
http://www.cs.princeton.edu/courses/archive/spring09/cos226/assignments/8puzzle.html

The graph theoretic way to solve the problem is to imagine every configuration of the board as a vertex of the graph and then use a breath-first search with pruning based on something like the Manhatten Distance of the board to derive a shortest path from the starting configuration to the solution.
One problem with this approach is that for any n x n board where n > 3 the game space becomes so large that it is not clear how you can efficiently mark the visited vertices. In other words there is no obvious way to assess if the current configuration of the board is identical to one that has previously been discovered through traversing some other path. Another problem is that the graph size grows so quickly with n (it's approximately (n^2)!) that it is just not suitable for a brue-force attack as the number of paths becomes computationally infeasible to traverse.
This paper by Ian Parberry A Real-Time Algorithm for the (n^2 − 1) - Puzzle describes a simple greedy algorithm that iteritively arrives at a solution by completing the first row, then the first column, then the second row... It arrives at a solution almost immediately, however the solution is far from optimal; essentially it solves the problem the way a human would without leveraging any computational muscle.
This problem is closely related to that of solving the Rubik's cube. The graph of all game states it too large to solve by brue force, but there is a fairly simple 7 step method that can be used to solve any cube in about 1 ~ 2 minutes by a dextrous human. This path is of course non-optimal. By learning to recognise patterns that define sequences of moves the speed can be brought down to 17 seconds. However, this feat by Jiri is somewhat superhuman!
The method Parberry describes moves only one tile at a time; one imagines that the algorithm could be made better up by employing Jiri's dexterity and moving multiple tiles at one time. This would not, as Parberry proves, reduce the path length from n^3, but it would reduce the coefficient of the leading term.

Remember that A* will search through the problem space proceeding down the most likely path to goal as defined by your heurestic.
Only in the worst case will it end up having to flood fill the entire problem space, this tends to happen when there is no actual solution to your problem.

Just use the game tree. Remember that a tree is a special form of graph.
In your case the leaves of each node will be the game position after you make one of the moves that is available at the current node.

Here you go http://www.heyes-jones.com/astar.html

Also. be mindful that with the A-Star algorithm, at least, you will need to figure out a admissible heuristic to determine whether a possible next step is closer to the finished route than another step.

For my current experience, on how to solve an 8 puzzle.
it is required to create nodes. keep track of each step taken
and get the manhattan distance from each following steps, taking/going to the one with the shortest distance.
update the nodes, and continue until reaches the goal

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight