Reasonable training result in reinforcement leraning, but how to improve further? - artificial-intelligence

I have a 4 dof robot. I am trying to teach this specific movement: "Whenever you move, don't move joint 1 (orange in the plot) at the same time with joints 2, 3, 4". The corresponding reward function is:
reward= 1/( abs(torque_q1) * max(abs(torque_q2) , abs(torque_q3), abs(torque_q4) )
As the plot shows, the learned policy somehow reproduces the intended movement: first q1 movement and the other joints. But the part that I want to improve is around t=13. There q1 gradually decreases and the other joints gradually start to move. Is there a way to improve this so that there is a complete stop of q1 movement, and then the other joints start to move?
First training results: Joints position plots

Related

Genetic algorithm - evolve only one object

I have an AI class and we have to make projects. I chose to do a genetic algorithm and since I'm new to the concept I have couple of questions. I have researched and I get the idea and followed Coding Train's video on simple genetic algorithm without any problem. However i have seen multiple videos on YouTube where cars evolve, and I don't get how do they have population of lets say 20 if only one car is being rendered to the screen. I wanna try and create Pong like game(I'll use basic physics engine) where Player A is computer, which always follows the Y coordinate of the ball thus can't lose, and Player B is supposed to evolve using genetic algorithm. How would I evolve Player B every time it loses. What would the chromosomes be? What would the population be? If you can give me any advice I would be very thankful
Regarding the cars, it's most likely that each car in the generation is being evaluated and rendered sequentially. Suppose the population size is 20, the first 20 cars you see would be the initial population. The next 20 cars you see would be the second generations population and so on.
Regarding Pong, you need to decide on a fitness function for your Player B. If Player B always loses then perhaps your fitness function could be how long it is able to last before it loses. To determine your chromosome you first need to decide how you will control Player B's paddle. The chromosome would then be some set of design variables that affect that system. For example, you might use a small neural net where your chromosome encodes the weights of the connections. Your population is a set of chromosomes used to produce the next generations set of chromosomes through crossover and mutation.

Better Heuristic function for a game (AI Minimax)

There is a game that I've programmed in java. The game is simple (refer to the figure below). There are 4 birds and 1 larva. It is a 2 player game (AI vs Human).
Larva can move diagonally forward AND diagonally backward
Birds can ONLY move diagonally forward
Larva wins if it can get to line 1 (fence)
Larva also wins if birds have no moves left
Birds CANNOT "eat" the larva.
Birds win if Larva has NO move left (cannot move at all)
When the game starts, Larva begins, then ONE bird can move (any one), then Larva, etc...
I have implemented a MiniMax (Alpha Beta Pruning) and I'm using the following evaluate() function (heuristic function).
Let us give the following numbers to each square on the board.
Therefore, our evaluation function will be
h(n) = value of position of larva - value of position of bird 1 - value of position of bird 2 - value of position of bird 3 - value of position of bird 4
the Larva will try to MAXIMIZE the heuristic value whereas the Birds will try to MINIMIZe it
Example:
However, this is a simple and naive heuristic. It does not act in a smart manner. I am a beginner in AI and I would like to know what can I do to IMPROVE this heuristic function?
What would be a good/informed heuristic?
How about this :
Maximum :larva
Minimum :birds
H(t)=max_distance(larva,line_8)+Σmin_distance(bird_n,larva)
or
H(t)=Σmin_distance(bird_n,larva) - min_distance(larva,line_1)
max_distance(larva,line_8): to reflect the condition that larva is closer to the line 1.
Σmin_distance(bird_n,larva): to reflect the condition that birds are closer to the larva(to block it).
I believe there are still many thing could be considered ,for example ,the bird closest to the larva should have high priority to be chosen to move, but the direction about the function above make sense , and many details can be thought to improve it easily.
There is 1 simple way to improve your heuristic considerably. In your current heuristic, the values of square A1 is 8 less than the value of square A8. This makes the Birds inclined to move towards the left side of the game board, as a move to the left will always be higher than a move to the right. This is nit accurate. All squares on row 1 should have the same value. Thus assign all squares in row 1 a 1, in row 2 a 2, etc. This way the birds and larva won't be inclined to move to the left, and instead can focus on making a good move.
You could take into account the fact that birds will have a positional advantage over the larva when the larva is on the sides of the board, so if Larva is MAX then change the side tile values of the board to be smaller.

Proper Heuristic Mechanism For Hill Climbing

The following problem is an exam exercise I found from an Artificial Intelligence course.
"Suggest a heuristic mechanism that allows this problem to be solved, using the Hill-Climbing algorithm. (S=Start point, F=Final point/goal). No diagonal movement is allowed."
Since it's obvious that Manhattan Distance or Euclidean Distance will send the robot at (3,4) and no backtracking is allowed, what is a possible solution (heuristic mechanism) to this problem?
EDIT: To make the problem clearer, I've marked some of the Manhattan distances on the board:
It would be obvious that, using Manhattan distance, the robot's next move would be at (3,4) since it has a heuristic value of 2 - HC will choose that and get stuck forever. The aim is try and never go that path by finding the proper heuristic algorithm.
I thought of the obstructions as being hot, and that heat rises. I make the net cost of a cell the sum of the Manhattan metric distance to F plus a heat-penalty. Thus there is an attractive force drawing the robot towards F as well as a repelling force which forces it away from the obstructions.
There are two types of heat penalties:
1) It is very bad to touch an obstruction. Look at the 2 or 3 cells neighboring cells in the row immediately below a given cell. Add 15 for every obstruction cell which is directly below the given cell and 10 for every diagonal neighbor which is directly below
2) For cells not in direct contact with the instructions -- the heat is more diffuse. I calculate it as 6 times the average number of obstruction blocks below the cell both in its column and in its neighboring columns.
The following shows the result of combining this all, as well as the path taken from S to F:
A crucial point it the way that the averaging causes the robot to turn left rather than right when it hits the top row. The unheated columns towards the left make that the cooler direction. It is interesting to note how all cells (with the possible exception of the two at the upper-right corner) are drawn to F by this heuristic.

Chasing game in C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I'm stuck with a quite complex problem:
On an MxN field containing a chicken, an eagle and a yard,
the chicken tries to escape the eagle (by entering the yard),
and the eagle tries to catch the chicken. The chicken escapes
when reaches inside the yard, and the eagle catches the chicken
when it's in the same position as the chicken. In a single step,
the eagle can move one or two small squares, and the chicken can
move a single square in any direction. The program should display
a message saying if the chicken can win. It should compute the moves,
and, at each step, it should write in the output file the current
configuration of the field, and it should also visually represent
it on the screen. The dimensions of the field, position of the chicken
and of the eagle, and also of the yard, are given in a file.
I've solved the part with creating the field (a matrix), but I can't figure it out how to solve this. Perhaps backtracking would be an idea, but it's very complicated, and I can't handle it. I think I should find a way to find out the distance between the chicken and the yard, also between the eagle and the yard, and work somehow with that. It has to be in C. Any suggestion, idea is welcomed!
Thank you in advance!
It is an interesting problem. Let's go over the rules again. Players
Chicken: takes shortest path to field (there could be multiple shortest paths) and away from eagle (maximise the distance between itself and eagle among shortest paths).
Eagle: takes shortest path to chicken
To solve the problem we have to assume it is played in turns: first chicken then eagle and so on.
Game is over when :
Eagle is on chicken.
Chicken is on field.
Here is the trick for the distance:
Update
The distance you want is called Chebyshev distance. You can easily calculate it:
distance = max of(difference of corresponding coordinates between the two points)
For (1,1) and (2,3) distance = max(|1-2|,|2-3|) = 2
For (2,3) and (4,7) distance = 4
For (4,5,6) and (1,1,1) distance = 5
You can ignore the older answer if you want.
Old
distance = Manhattan distance - length of longest 45 deg diagonal
Manhattan distance is easy to understand. See its wiki. Take some examples :
---/F
--/-|
-/--|
C---X
manhattan distance = 7
length of max diagonal = 3
distance = 7-3 = 4
Another one
---/-F
--/--|
-/---|
C----X
distance = 8-3 = 5
Caveat: Remember there can be many shortest possible paths. For eg.
---F
--/F
-/-F
C--F
-\-F
--\F
---F
Lots of places to go in 3 moves. Pick one which is farthest from eagle using distance calculator.
Also if distance between eagle and chicken is less than chicken and field at any time then eagle wins else chicken. Just simulate the moves and you will know.

Can we solve this using a greedy strategy? If not how do we solve this using dynamic programming?

Problem:
The city of Siruseri is impeccably planned. The city is divided into a rectangular array of cells with M rows and N columns. Each cell has a metro station. There is one train running left to right and back along each row, and one running top to bottom and back along each column. Each trains starts at some time T and goes back and forth along its route (a row or a column) forever.
Ordinary trains take two units of time to go from one station to the next. There are some fast trains that take only one unit of time to go from one station to the next. Finally, there are some slow trains that take three units of time to go from one station the next. You may assume that the halting time at any station is negligible.
Here is a description of a metro system with 3 rows and 4 columns:
S(1) F(2) O(2) F(4)
F(3) . . . .
S(2) . . . .
O(2) . . . .
The label at the beginning of each row/column indicates the type of train (F for fast, O for ordinary, S for slow) and its starting time. Thus, the train that travels along row 1 is a fast train and it starts at time 3. It starts at station (1,1) and moves right, visiting the stations along this row at times 3, 4, 5 and 6 respectively. It then returns back visiting the stations from right to left at times 6, 7, 8 and 9. It again moves right now visiting the stations at times 9, 10, 11 and 12, and so on. Similarly, the train along column 3 is an ordinary train starting at time 2. So, starting at the station (3,1), it visits the three stations on column 3 at times 2, 4 and 6, returns back to the top of the column visiting them at times 6,8 and 10, and so on.
Given a starting station, the starting time and a destination station, your task is to determine the earliest time at which one can reach the destination using these trains.
For example suppose we start at station (2,3) at time 8 and our aim is to reach the station (1,1). We may take the slow train of the second row at time 8 and reach (2,4) at time 11. It so happens that at time 11, the fast train on column 4 is at (2,4) travelling upwards, so we can take this fast train and reach (1,4) at time 12. Once again we are lucky and at time 12 the fast train on row 1 is at (1,4), so we can take this fast train and reach (1,1) at time 15. An alternative route would be to take the ordinary train on column 3 from (2,3) at time 8 and reach (1,3) at time 10. We then wait there till time 13 and take the fast train on row 1 going left, reaching (1,1) at time 15. You can verify that there is no way of reaching (1,1) earlier than that.
Test Data: You may assume that M, N ≤ 50.
Time Limit: 3 seconds
As the size of N,M is very small we can try to solve it by recursion.
At every station, we take two trains which can take us nearer to our destination. E.g.: If we want to go to 1,1 from 2,3 , we take the trains which take us more near to 2,3 and get down to the nearest station to our destination, while keeping track of the time we take, if we reach the destination, we keep track of the minimum time so far, and if the time taken to reach the destination is lesser than the minimum we update it.
We can determine which station a train is at a particular time using this method:
/* S is the starting time of the train and N is the number of stations it
visits, T is the time for which we want to find the station the train is at.
T always be greater than S*/
T = T-S+1
Station(T) = T%N, if T%N = 0, then Station(T) = N;
Here is my question:
How do we determine the earliest time when a particular train reaches the station we want in the direction we want?
As my above algorithm uses greedy strategy, will it give an accurate answer? If not then how do I approach this problem?
P.S : This is not homework, it is an online judge problem.
I believe greedy solution will fail here, but it will be a bit hard to construct a counter-example.
This problem is meant to be solved using Dijkstra's algorithm. Edges are the connection between adjacent nodes and depend on the type of train and its starting time. You also don't need to compute the whole graph - only compute edged for the current node you are considering. I have solved numerous similar problems and this is the way you solved. Also tried to use greedy several times before I learnt it never passes.
Hope this helps.

Resources